## Scalable sweeping-based spatial join (1998)

### Cached

### Download Links

Venue: | IN PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB |

Citations: | 67 - 8 self |

### BibTeX

@INPROCEEDINGS{Arge98scalablesweeping-based,

author = {Lars Arge and Octavian Procopiuc and Sridhar Ramaswamy and Torsten Suel and Jeffrey Scott Vitter},

title = {Scalable sweeping-based spatial join},

booktitle = {IN PROC. 24TH INT. CONF. VERY LARGE DATA BASES, VLDB},

year = {1998},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper, we consider the filter step of the spatial join problem, for the case where neither of the inputs are indexed. We present a new algorithm, Scalable Sweeping-Based Spatial Join (SSSJ), that achieves both efficiency on real-life data and robustness against highly skewed and worst-case data sets. The algorithm combines a method with theoretically optimal bounds on I/O transfers based on the recently proposed distribution-sweeping technique with a highly optimized implementation of internal-memory plane-sweeping. We present experimental results based on an efficient implementation of the SSSJ algorithm, and compare it to the state-ofthe-art Partition-Based Spatial-Merge (PBSM) algorithm of Pate1 and DeWitt.

### Citations

9158 | Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1998
(Show Context)
Citation Context ...ination of an interval tree [Ede83] and a skip list [Pug90]. More precisely, we used a simplified dynamic version of the interval tree similar to that described in Section 15.3 and Exercise 15.3–4 of =-=[CLR90]-=-, but implemented the structure using a randomized skip list instead of a balanced tree structure. (Another, though somewhat different, structure combining interval trees and skip lists has been descr... |

2410 | R-trees: a dynamic index structure for spatial searching
- Guttman
- 1984
(Show Context)
Citation Context ...ez [Val87]. The join index used in [Rot91] partially computes the result of the spatial join using a grid file. There has recently been much interest in using spatial index structures like the R-tree =-=[Gut85]-=-, R+tree [SRF87], R -tree [BKSS90], and PMR quadtree [Sam89] to speed up the filter step of the spatial join. Brinkhoff, Kriegel, and Seeger [BKS93] propose a spatial join algorithm based on R -trees.... |

1877 |
Computational Geometry: An Introduction
- Preparata, Shamos
- 1985
(Show Context)
Citation Context ...le intersection problem can be solved in main memory by applying a technique called Plane Sweeping. Plane sweeping is one of the most basic algorithmic paradigms in computational geometry (see, e.g., =-=[PS85]-=-). Simply speaking, a plane-sweeping (or sweepline) algorithm attempts to solve a geometric problem by moving a vertical or horizontal sweepline across the scene, processing objects as they are reache... |

1083 | The R∗-tree: an efficient and robust access method for points and rectangles - Beckmann, Kriegel, et al. - 1990 |

563 |
The input/output complexity of sorting and related problems
- Aggarwal, Vitter
- 1988
(Show Context)
Citation Context ...ts and theoretical framework developed in [APR+98]. The algorithm uses the distribution-sweeping technique developed in [GTVV93] and further developed in [Arg95, AVV98]. Following Aggarwal and Vitter =-=[AV88]-=- we use the following I/O-model: We make the assumption that each access to disk transmits one disk block withBunits of data, and we count this as one I/O operation.4We denote the total amount of main... |

404 | The grid file: an adaptable, symmetric multikey file structure
- Nievergelt, Hinterberger, et al.
- 1984
(Show Context)
Citation Context ...s of spatial objects (which are rectangles in two dimensions) are transformed into points in four dimensions. The resulting points are stored in a multi-attribute data structure such as the grid file =-=[NHS84]-=-, which is then used for the filter step. Rotem [Rot91] proposes a spatial join algorithm based on the join index of Valduriez [Val87]. The join index used in [Rot91] partially computes the result of ... |

343 | Skip lists: a probabilistic alternative to balanced trees
- Pugh
- 1990
(Show Context)
Citation Context ...n PBSM as Forward Sweep. In the following we discuss each of these algorithms. Algorithm Tree Sweep uses a data structure that is essentially a combination of an interval tree [Ede83] and a skip list =-=[Pug90]-=-. More precisely, we used a simplified dynamic version of the interval tree similar to that described in Section 15.3 and Exercise 15.3–4 of [CLR90], but implemented the structure using a randomized s... |

341 | Parallel processing of spatial joins using R-trees
- Brinkhoff, Kriegel, et al.
- 1996
(Show Context)
Citation Context ...t in using spatial index structures like the R-tree [Gut85], R+tree [SRF87], R -tree [BKSS90], and PMR quadtree [Sam89] to speed up the filter step of the spatial join. Brinkhoff, Kriegel, and Seeger =-=[BKS93]-=- propose a spatial join algorithm based on R -trees. Their algorithm is a carefully synchronized depth-first traversal of the two trees to be joined. An improvement of this algorithm was recently repo... |

305 | The r -tree: A dynamic index for multi-dimensional objects
- Sellis, Roussopoulos, et al.
- 1987
(Show Context)
Citation Context ... index used in [Rot91] partially computes the result of the spatial join using a grid file. There has recently been much interest in using spatial index structures like the R-tree [Gut85], R - tree =-=[SRF87]-=-, R -tree [BKSS90], and PMR quadtree [Sam89] to speed up the filter step of the spatial join. Brinkhoff, Kriegel, and Seeger [BKS93] propose a spatial join algorithm based on R -trees. Their algor... |

297 | The R+-tree: a dynamic index for multi-dimensional objects
- Sellis, Roussopoulos, et al.
- 1987
(Show Context)
Citation Context ...join index used in [Rot91] partially computes the result of the spatial join using a grid file. There has recently been much interest in using spatial index structures like the R-tree [Gut85], R+tree =-=[SRF87]-=-, R -tree [BKSS90], and PMR quadtree [Sam89] to speed up the filter step of the spatial join. Brinkhoff, Kriegel, and Seeger [BKS93] propose a spatial join algorithm based on R -trees. Their algorithm... |

216 | Join Indices
- Valduriez
- 1987
(Show Context)
Citation Context ... stored in a multi-attribute data structure such as the grid file [NHS84], which is then used for the filter step. Rotem [Rot91] proposes a spatial join algorithm based on the join index of Valduriez =-=[Val87]-=-. The join index used in [Rot91] partially computes the result of the spatial join using a grid file. There has recently been much interest in using spatial index structures like the R-tree [Gut85], R... |

183 |
Priority search trees
- McCreight
- 1985
(Show Context)
Citation Context ...er the sweepline has passed over them. Many optimal and suboptimal dynamic data structures for intervals have been proposed; important examples are the interval tree [Ede83], the priority search tree =-=[McC85]-=-, and the segment tree [Ben77]. 4.2 The Square-Root Rule In most implementations of plane-sweeping algorithms, the maximum amount of memory ever needed is determined by the maximum number of rectangle... |

181 | Spatial query processing in an object-oriented database system - Orenstein - 1986 |

177 | Partition based spatial-merge join
- Patel, DeWitt
- 1996
(Show Context)
Citation Context ...n both relations, these indices are commonly used in the implementation of the spatial join. In this paper, we focus on the case in which neither of the inputs to the join is indexed. As discussed in =-=[PD96]-=- such cases arise when the relations to be joined are intermediate results, and in a parallel database environment where inputs are coming in from multiple processors. 1.1 Summary of this Paper We pre... |

158 | The bu®er tree: A new technique for optimal I/O-algorithms - Arge - 1995 |

123 | External-Memory Computational Geometry
- Goodrich, Tsay, et al.
- 1993
(Show Context)
Citation Context ...cribed in the next section. We point out that this section is based on the results and theoretical framework developed in [APR+98]. The algorithm uses the distribution-sweeping technique developed in =-=[GTVV93]-=- and further developed in [Arg95, AVV98]. Following Aggarwal and Vitter [AV88] we use the following I/O-model: We make the assumption that each access to disk transmits one disk block withBunits of da... |

109 | PROBE spatial data modeling and query processing in an image database application - Orenstein, Manola - 1988 |

106 |
Spatial joins using seeded trees
- Lo, Ravishankar
- 1994
(Show Context)
Citation Context ...n indices and spatial indices for the spatial join. Hoel and Samet [HS92] propose the use of PMR quadtrees for the spatial join and compare it against members of the R-tree family. Lo and Ravishankar =-=[LR94]-=- discuss the case where exactly one of the relations does not have an index. They construct an index for that relation on the fly, by using the index on the other relation as a starting point (the see... |

101 | Spatial hash-joins
- Lo, Ravishankar
- 1996
(Show Context)
Citation Context ... then use the tree join algorithm of [BKS93] for computing the join. Another recent paper [KS97] proposes an algorithm based on a filter tree structure. Patel and DeWitt [PD96] and Lo and Ravishankar =-=[LR96]-=- both propose hash-based spatial join algorithms that use a spatial partitioning function to subdivide the input, such that each partition fits entirely in memory. Patel and DeWitt then use a plane-sw... |

95 | Efficient computation of spatial joins
- Günther
- 1993
(Show Context)
Citation Context ...rithm was recently reported in [HJR97]. (Another interesting technique for efficiently traversing a multi-dimensional index structure was proposed in [KHT89] in a slightly different context.) Günther =-=[Gün93]-=- studies the tradeoffs between using join indices and spatial indices for the spatial join. Hoel and Samet [HS92] propose the use of PMR quadtrees for the spatial join and compare it against members o... |

89 | Spatial joins using R-trees: breadth-first traversal with global optimizations
- Huang, Jing, et al.
- 1997
(Show Context)
Citation Context ...nal index structure was proposed in [KHT89] in a slightly different context.) Günther [Gün93] studies the tradeoffs between using join indices and spatial indices for the spatial join. Hoel and Samet =-=[HS92]-=- propose the use of PMR quadtrees for the spatial join and compare it against members of the R-tree family. Lo and Ravishankar [LR94] discuss the case where exactly one of the relations does not have ... |

78 | Externalmemory algorithms for processing line segments in geographic information systems - Arge, Vengroff, et al. - 1995 |

76 | Redundancy in spatial databases - Orenstein - 1989 |

75 |
Spatial join indices
- Rotem
- 1991
(Show Context)
Citation Context ...sions) are transformed into points in four dimensions. The resulting points are stored in a multi-attribute data structure such as the grid file [NHS84], which is then used for the filter step. Rotem =-=[Rot91]-=- proposes a spatial join algorithm based on the join index of Valduriez [Val87]. The join index used in [Rot91] partially computes the result of the spatial join using a grid file. There has recently ... |

65 | AlphaSort: A RISC Machine Sort - Nyberg, Barclay, et al. - 1994 |

64 | Size separation spatial join
- Koudas, Sevcik
- 1997
(Show Context)
Citation Context ...n indices and spatial indices for the spatial join. Hoel and Samet [HS92] propose the use of PMR quadtrees for the spatial join and compare it against members of the R-tree family. Lo and Ravishankar =-=[LR94]-=- discuss the case where exactly one of the relations does not have an index. They construct an index for that relation on the fly, by using the index on the other relation as a starting point (the see... |

60 |
A comparison of spatial query processing techniques for native and parameter spaces
- Orenstein
- 1990
(Show Context)
Citation Context ...lest axis-parallel rectangle that completely contains it. This rectangle is referred to as the spatial object’s minimum bounding rectangle (MBR). Spatial operations can then be performed in two steps =-=[Ore90]-=-: Filter Step: The spatial operation is performed on the approximate representation, such as the MBR. For example, when joining two spatial relations, the first step is to identify all intersecting pa... |

57 |
Algorithms for Klee’s rectangle problems
- Bentley
- 1977
(Show Context)
Citation Context ...er them. Many optimal and suboptimal dynamic data structures for intervals have been proposed; important examples are the interval tree [Ede83], the priority search tree [McC85], and the segment tree =-=[Ben77]-=-. 4.2 The Square-Root Rule In most implementations of plane-sweeping algorithms, the maximum amount of memory ever needed is determined by the maximum number of rectangles that are intersected by a si... |

49 | A qualitative comparison study of data structures for large linear segment databases
- Hoel, Samet
- 1992
(Show Context)
Citation Context ...l index structure was proposed in [KHT89] in a slightly different context.) Günther [Gün93] studies the tradeoffs between using join indices and spatial indices for the spatial join. Hoel and Samet =-=[HS92]-=- propose the use of PMR quadtrees for the spatial join and compare it against members of the R-tree family. Lo and Ravishankar [LR94] discuss the case where exactly one of the relations does not have ... |

48 |
Rule-Based Optimization and Query Processing in an Extensible Geometric Database System
- Becker, Güting
- 1992
(Show Context)
Citation Context ...O(pN). This observation, which is known as the square-root rule in the VLSI literature (see, e.g., [GS87]), seems to have been largely overlooked in the spatial join literature (with the exception of =-=[BG92]-=-). It implies that for most real-life data sets, we can bypass the vertical partitioning step in our algorithm and directly perform the 1with the exception of the work in [GS87] plane-sweeping algorit... |

45 |
A new approach to rectangle intersections: part I
- Edelsbrunner
- 1983
(Show Context)
Citation Context ... is performed, and are removed after the sweepline has passed over them. Many optimal and suboptimal dynamic data structures for intervals have been proposed; important examples are the interval tree =-=[Ede83]-=-, the priority search tree [McC85], and the segment tree [Ben77]. 4.2 The Square-Root Rule In most implementations of plane-sweeping algorithms, the maximum amount of memory ever needed is determined ... |

40 | A Super Scalar Sort Algorithm for RISC Processors
- Agarwal
- 1996
(Show Context)
Citation Context ...le inQ. We inO(nlogmn) will show that the following algorithm performsO(nlogmn+t)I/O operations, and thus asymptotically matches the lower bound implied by the sorting lower bound of [AV88] (see also =-=[AM]-=-). It can be shown that the algorithm is also optimal in terms of CPU time. We again assume that at the beginning of the algorithm,PandQhave already been sorted into one listLof rectangles by their lo... |

37 | I/O-efficient scientific computation using TPIE - Vengoff, Vitter - 1996 |

35 | A transparent parallel I/O environment - VENGROFF - 1994 |

29 |
A new algorithm for computing joins with grid files
- Becker, Hinrichs, et al.
- 1993
(Show Context)
Citation Context ... sensitive to the size of the pixels chosen, in that smaller pixels leads to better filtering, but also increase the number of pixels associated with each object. In another transformational approach =-=[BHF93]-=-, the MBRs of spatial objects (which are rectangles in two dimensions) are transformed into points in four dimensions. The resulting points are stored in a multi-attribute data structure such as the g... |

29 |
The Montage Extensible DataBlade Architecture
- Ubell
- 1994
(Show Context)
Citation Context ...d research database communities over the last decade. Several commercial products that manage spatial data are available. These include ESRI’s ARC/INFO [ARC93], InterGraph’s MGE [Int97], and Informix =-=[Ube94]-=-. GISs typically store and manage spatial data such as points, lines, poly-lines, polygons, and surfaces. Since the amount of data they manage is quite large, GISs are often diskbased systems. An extr... |

26 | External-memory algorithms with applications in geographic information systems - Arge - 1997 |

26 | Experiments on the practical I/O efficiency of geometric algorithms: Distribution sweep vs. plane sweep
- Chiang
- 1995
(Show Context)
Citation Context ...ata distributions on performance, we compared the behavior of the three algorithms on synthetically generated data sets. We generated two data sets of skewed rectangles, following a procedure used in =-=[Chi95]-=-. Each data set contains twoTime (seconds) 1000 800 600 "MPBSM" "QPBSM" "SSSJ" N 0 400 the[0;N][0;N] 200 0 0 200000 400000 600000 800000 1e+06 Number of rectangles Figure 8: Running times for wide re... |

25 |
Generating seeded trees from data sets
- Lo, Ravishankar
- 1995
(Show Context)
Citation Context ...of [BKS93] is used to perform the actual join. The other major direction for research on the spatial join has focused on the case where neither of the input relations has an index. Lo and Ravishankar =-=[LR95]-=- propose to first build indices for the relations on the fly using spatial sampling techniques and then use the tree join algorithm of [BKS93] for computing the join. Another recent paper [KS97] propo... |

24 | The interval skip list: a data structure for finding all intervals that overlap a point. Computer Science and Engineering
- Hanson
- 1991
(Show Context)
Citation Context ...plemented the structure using a randomized skip list instead of a balanced tree structure. (Another, though somewhat different, structure combining interval trees and skip lists has been described in =-=[Han91]-=-.) Our reason for using a skip list is that it allows for a fairly simple but efficient implementation while matching (in a probabilistic sense) the good worst-case behavior of a balanced tree. With t... |

23 | On Showing Lower Bounds for External-Memory Computational Geometry Problems. External Memory Algorithms and Visualization
- Arge, Miltersen
- 1999
(Show Context)
Citation Context ... in . We will show that the following algorithm performs . : I/O operations, and thus asymptotically matches the lower bound implied by the sorting lower bound of [AV88] (see also =-=[AM]-=-). It can be shown that the algorithm is also optimal in terms of CPU time. We again assume that at the beginning of the algorithm,sand have already been sorted into one list - of rectangles by thei... |

21 | Theory and practice of I/O-efficient algorithms for multidimensional batched searching problems - Arge, Procopiuc, et al. - 1998 |

18 | TPIE User Manual and Reference - Vengroff - 1995 |

16 |
Technical Documentation
- Files
- 1997
(Show Context)
Citation Context ... of 4as compared to the original implementation of Patel and DeWitt [PD96]. The data we used is a standard benchmark data for the spatial join, namely the Tiger/Line data from the US Bureau of Census =-=[Tig92]-=-. Our experiments showed that SSSJ performs at least25%better than the original PBSM. On the other hand, the improved version of PBSM actually performed about10%better than SSSJ on the real-life data ... |

16 |
A practical divide-and-conquer algorithm for the rectangle intersection problem
- Güting, Schilling
- 1987
(Show Context)
Citation Context ...bserved that on all our realistic data sets of sizeN, this sweepline structure never grew beyond sizeO(pN). This observation, which is known as the square-root rule in the VLSI literature (see, e.g., =-=[GS87]-=-), seems to have been largely overlooked in the spatial join literature (with the exception of [BG92]). It implies that for most real-life data sets, we can bypass the vertical partitioning step in ou... |

14 |
The Design and Analyses of Spatial Data Structures
- Samet
- 1989
(Show Context)
Citation Context ...s the result of the spatial join using a grid file. There has recently been much interest in using spatial index structures like the R-tree [Gut85], R+tree [SRF87], R -tree [BKSS90], and PMR quadtree =-=[Sam89]-=- to speed up the filter step of the spatial join. Brinkhoff, Kriegel, and Seeger [BKS93] propose a spatial join algorithm based on R -trees. Their algorithm is a carefully synchronized depth-first tra... |

12 |
Join strategies on KD-tree indexed relations
- Kitsuregawa, Harada, et al.
- 1989
(Show Context)
Citation Context ...wo trees to be joined. An improvement of this algorithm was recently reported in [HJR97]. (Another interesting technique for efficiently traversing a multi-dimensional index structure was proposed in =-=[KHT89]-=- in a slightly different context.) Günther [Gün93] studies the tradeoffs between using join indices and spatial indices for the spatial join. Hoel and Samet [HS92] propose the use of PMR quadtrees for... |

11 |
Understanding GIS—the ARC/INFO method. ARC/INFO
- ARCINFO
- 1993
(Show Context)
Citation Context ...generated enormous interest in the commercial and research database communities over the last decade. Several commercial products that manage spatial data are available. These include ESRI’s ARC/INFO =-=[ARC93]-=-, InterGraph’s MGE [Int97], and Informix [Ube94]. GISs typically store and manage spatial data such as points, lines, poly-lines, polygons, and surfaces. Since the amount of data they manage is quite ... |

6 | High-performance sorting on network of workstations - Dusseau, Arpaci, et al. - 1997 |

5 | A practical divide-and conquer algorithm for the rectangle intersection problem - Güting, Schilling - 1984 |