Results 1 -
8 of
8
Should SDBMS Support a Join Index?: A Case
"... Given a spatial crime data warehouse, that is updated infrequently and a set of operations O as well as constraints of storage and update overheads, the index type selection problem is to find a set of index types that can reduce the I/O cost of the set of operations. The index type selection proble ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Given a spatial crime data warehouse, that is updated infrequently and a set of operations O as well as constraints of storage and update overheads, the index type selection problem is to find a set of index types that can reduce the I/O cost of the set of operations. The index type selection problem is important to improve user experience and system resource utilization in crucial spatial statistics application domains such as mapping and analysis for public safety, public health, ecology, and transportation. This is because the response time of frequent queries based on the set of operations can be improved significantly by an effective choice of index types. Many spatial statistical queries in these application domains make use of a spatial neighborhood matrix, known as W in spatial statistics, which can be thought of as a spatial self-join in spatial database terminology. Currently supported index types such as B-Tree and R-Tree families do not adequately support spatial statistical analysis because they require on-the-fly computation of the W-Matrix, slowing down spatial statistical analysis. In contrast, this paper argues that Spatial Database Management Systems (SDBMS) should support a join index to materialize the W-Matrix and eliminate on-the-fly computation of the common selfjoin. A detailed case study using the popular spatial statistical software package for public safety, namely CrimeStat, shows that join indices can significantly speed up spatial analysis such as calculation of Ripley’s K and identification of hotspots.
Metric Space Similarity Joins
"... Similarity join algorithms find pairs of objects that lie within a certain distance ɛ of each other. Algorithms that are adapted from spatial join techniques are designed primarily for data in a vector space and often employ some form of a multi-dimensional index. For these algorithms, when the data ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Similarity join algorithms find pairs of objects that lie within a certain distance ɛ of each other. Algorithms that are adapted from spatial join techniques are designed primarily for data in a vector space and often employ some form of a multi-dimensional index. For these algorithms, when the data lies in a metric space, the usual solution is to embed the data in vector space and then make use of a multidimensional index. Such an approach has a number of drawbacks when the data is high dimensional as we must eventually find the most discriminating dimensions, which is not trivial. In addition, although the maximum distance between objects increases with dimension, the ability to discriminate between objects in each dimension does not. These drawbacks are overcome via the introduction of a new method called Quickjoin that does not require a multi-dimensional index and instead adapts techniques used in distance-based indexing for use in a method that is conceptually similar to the Quicksort algorithm. A formal analysis is provided of the Quickjoin method. Experiments show that the Quickjoin method significantly outperforms two existing techniques.
K Nearest Neighbor Queries and KNN-Joins in Large Relational Databases (Almost) for Free
"... Abstract — Finding the ..."
Pipelined Spatial Join Processing for Quadtree-based Indexes
"... Spatial join is an important yet costly operation in spatial databases. In order to speed up the execution of a spatial join, the input tables are often indexed based on their spatial attributes. The quadtree index structure is a well-known index for organizing spatial database objects. It has been ..."
Abstract
- Add to MetaCart
Spatial join is an important yet costly operation in spatial databases. In order to speed up the execution of a spatial join, the input tables are often indexed based on their spatial attributes. The quadtree index structure is a well-known index for organizing spatial database objects. It has been implemented in several database management systems, e.g., in Oracle Spatial and in PostgreSQL (via SP-GiST). Queries typically involve multiple pipelined spatial join operators that fit together in a query evaluation plan. In order to extend the applicability of these spatial joins, they are optimized so that upon receiving sorted input, they produce sorted output for the spatial join operators in the upperlevels of the query evaluation pipeline. This paper investigates the use of quadtree-based spatial join algorithms and how they can be adapted to answer queries that involve multiple pipelined spatial joins in a query evaluation plan. The paper investigates several adaptations to pipelined spatial join algorithms and their performance for the cases when both input tables are indexed, when only one of the tables is indexed while the second table is sorted, and when both tables are sorted but are not indexed.
Intensional Associations in Dataspaces
"... Abstract — Dataspace applications necessitate the creation of associations among data items over time. For example, once information about people is extracted from sources such as webpages and blogs, associations among them may emerge as a consequence of different criteria, such as their city of ori ..."
Abstract
- Add to MetaCart
Abstract — Dataspace applications necessitate the creation of associations among data items over time. For example, once information about people is extracted from sources such as webpages and blogs, associations among them may emerge as a consequence of different criteria, such as their city of origin, their elected hobbies, or their age group. In a set of personal data sources, we may wish to associate documents and emails based on their modification dates or their authors. In this paper, we advocate a declarative approach to specifying these associations. We propose that each set of associations be defined by an association trail. An association trail is a query-based definition of how items are connected by intensional (i.e., virtual) association edges to other items in the dataspace. The benefit of this mechanism is the creation of an intensional graph of associations among previously disconnected data items coming from different data sources. We study in detail the problem of processing neighborhood queries over these intensional association graphs. The naive approach to neighborhood query processing over intensional graphs is to materialize the whole graph and then apply previous work on dataspace graph indexing to answer queries. As the intensional graph may have a number of edges quadratic in its number of nodes, the naive approach has worst-case quadratic indexing cost. We develop in this paper a novel indexing technique, the grouping-compressed index (GCI), that exploits association trail definitions to materialize the same intensional graph with linear cost. In addition, we present a query answering algorithm over GCI that avoids decompressing the graph to its quadratic size. In our experimental evaluation, GCI is shown to provide an order of magnitude gain in indexing cost over the naive approach, while remaining competitive in query processing time. I.
SORTING SPATIAL DATA BY SPATIAL OCCUPANCY
- GEOSPATIAL VISUAL ANALYTICS: GEOGRAPHICAL ORMATION PROCESSING AND VISUAL ANALYTICS FOR ENVIRONMENTAL SECURITY
, 2009
"... The increasing popularity of web-based mapping services such as Microsoft Virtual Earth and Google Maps/Earth has led to a dramatic increase in awareness of the importance of location as a component of data for the purposes of further processing as a means of enhancing the value of the nonspatial da ..."
Abstract
- Add to MetaCart
The increasing popularity of web-based mapping services such as Microsoft Virtual Earth and Google Maps/Earth has led to a dramatic increase in awareness of the importance of location as a component of data for the purposes of further processing as a means of enhancing the value of the nonspatial data and of visualization. Both of these purposes inevitably involve searching. The efficiency of searching is dependent on the extent to which the underlying data is sorted. The sorting is encapsulated by the data structure known as an index that is used to represent the spatial data thereby making it more accessible. The traditional role of the indexes is to sort the data, which means that they order the data. However, since generally no ordering exists in dimensions greater than 1 without a transformation of the data to one dimension, the role of the sort process is one of differentiating between the data and what is usually done is to sort the spatial objects with respect to the space that they occupy. The resulting ordering should be implicit rather than explicit so that the data need not be resorted (i.e., the index need not be rebuilt) when the queries change. The indexes are said to order the space and the characteristics of such indexes are explored further.
Predictive Join Processing between Regions and Moving Objects ⋆
"... Abstract. The family of R-trees is suitable for indexing various kinds of multidimensional objects. TPR*-trees are R-tree based structures that have been proposed for indexing a moving object database, e.g. a database of moving boats. Region Quadtrees are suitable for indexing 2-dimensional regional ..."
Abstract
- Add to MetaCart
Abstract. The family of R-trees is suitable for indexing various kinds of multidimensional objects. TPR*-trees are R-tree based structures that have been proposed for indexing a moving object database, e.g. a database of moving boats. Region Quadtrees are suitable for indexing 2-dimensional regional data and their linear variant (Linear Region Quadtrees) is used in many Geographical Information Systems (GIS) for this purpose, e.g. for the representation of stormy, or sunny regions. Although, both are tree structures, the organization of data space, the types of spatial data stored and the search algorithms applied on them are different in R-trees and Region Quadtrees. In this paper, we examine a spatio-temporal problem that appears in many practical applications: processing of predictive joins between moving objects and regions (e.g. discovering the boats that will enter a storm), using these two families of data structures as storage and indexing mechanisms, and taking into account their similarities and differences. With a thorough experimental study, we show that the use of a synchronous Depth-First traversal order has the best performance balance (on average), taking into account the I/O activity and response time as performance measurements. Keywords: Moving objects, TPR-trees, R-trees, linear quad-trees, query processing, joins. 1
CROSS-BORDER TOPLOGICAL JOIN OPTIMIZATION OF DISTRIBUTED SPATIAL DATA BASED ON ZONAL FRAGMENTATION
"... Spatial data fragmentation classifies in zonal fragmentation and layer fragmentation in distributed Spatial database. Because of the geospatial continuity and strong correlation between spatial data, cross-border query becomes an inherent problem in distributed spatial query based on zonal fragmenta ..."
Abstract
- Add to MetaCart
Spatial data fragmentation classifies in zonal fragmentation and layer fragmentation in distributed Spatial database. Because of the geospatial continuity and strong correlation between spatial data, cross-border query becomes an inherent problem in distributed spatial query based on zonal fragmentation, and cross-border fragment join optimization is a core issue. Firstly, this paper discussed the general mean of grouping of the fragment joins, and they are divided into to groups, NCBJs and CBJs; Secondly the spatial topological predicates further the spatial joins are classified; then the optimization of 4 class of CBJs are discussed in detail, and the removing, filtering, transforming rules are proposed, further the processing algorithms. Tests are designed to examine the proposed methods, and the results show that the proposed methods improve the efficiency of cross-border join greatly. 1.

