Results 1  10
of
13
Similarity Indexing with the SStree
 In Proceedings of the 12th International Conference on Data Engineering
, 1996
"... jain0ece.ucsd.edu ..."
Similarity Indexing: Algorithms and Performance
 In Proceedings SPIE Storage and Retrieval for Image and Video Databases
, 1996
"... Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this pr ..."
Abstract

Cited by 125 (1 self)
 Add to MetaCart
(Show Context)
Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized kd tree we call the VAM kd tree, and provide algorithms to create an optimized Rtree we call the VAMSplit Rtree. We found that the VAMSplit Rtree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*tree and SStree in secondary memory applications, and modest improvements relative to optimized kd tree variants.Nearest Neighbor Search
Bulk Loading the Mtree
 In Proceedings of the 9th Australasian Database Conference (ADC'98
, 1998
"... . The Mtree is a dynamic paged structure that can be effectively used to index multimedia databases, where objects are represented by means of complex features and similarity queries require the computation of timeconsuming distance functions. The initial loading of the Mtree, however, can be ver ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
(Show Context)
. The Mtree is a dynamic paged structure that can be effectively used to index multimedia databases, where objects are represented by means of complex features and similarity queries require the computation of timeconsuming distance functions. The initial loading of the Mtree, however, can be very expensive. In this paper we propose a fast (bulk) loading algorithm to speedup the creation of the tree on a given dataset. Experimental results show that our BulkLoading algorithm can significantly improve the index' performance with respect to Mtree insertion methods, and its performance is comparable to that of static metric trees. 1 Introduction Contentbased retrieval of objects is one of the most common operations required by the incoming multimedia (MM) era. Multimedia users often request images, sounds, texts, and videos from large repositories for medical, scientific, legal, and art applications, to name a few. To be efficiently retrieved, such objects are characterized and in...
MasterClient Rtrees: A New Parallel Rtree Architecture
 In Proceedings of the 11th International Conference on Scientific and Statistical Database Management (SSDBM
, 1998
"... Scientific databases must be able to efficiently run subset retrievals of multidimensional data sets. If the data sets are very large significant retrieval speedups can be obtained via parallelism. In this paper we present a new parallel distributed shared nothing Rtree architecture. To the best of ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
Scientific databases must be able to efficiently run subset retrievals of multidimensional data sets. If the data sets are very large significant retrieval speedups can be obtained via parallelism. In this paper we present a new parallel distributed shared nothing Rtree architecture. To the best of our knowledge this is the first significant experimental study demonstrating practical application of parallel Rtrees in a shared nothing environment. We provide experimental results demonstrating actual speedups for several synthetic and real data sets. In addition, we conduct experimental studies to investigate the effect of several declustering strategies and communication parameters. 1 Introduction Scientific databases can be used to store and retrieve output from numerical simulations. The outputs from numeric simulations are often very large. This size problem is further exasperated by the use of parallel supercomputers resulting in grids and grid solution sets in the hundreds or th...
ClusterTree: Integration of Cluster Representation and Nearest Neighbor Search for Large Datasets with High Dimensionality
 IEEE Internati onal Conference on Multimedia and Expo, 2000
, 2000
"... In this paper, we introduce the ClusterTree, a new indexing approach to representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates t ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we introduce the ClusterTree, a new indexing approach to representing clusters generated by any existing clustering approach. A cluster is decomposed into several subclusters and represented as the union of the subclusters. The subclusters can be further decomposed, which isolates the most related groups within the clusters. A ClusterTree is a hierarchy of clusters and subclusters which incorporates the cluster representation into the index structure to achieve effective and efficient retrieval. Our cluster representation is highly adaptive to any kind of clusters. It is well accepted that most existing indexing techniques degrade rapidly when dimensionality goes higher. The ClusterTree can support the retrieval of the nearest neighbors effectively without having to linearly scan the highdimensional dataset. We also discuss an approach to dynamically reconstruct the ClusterTree when new data are added. We present the detailed analysis of this approach and justify it extensively by experiments. Keywords: indexing, cluster representation, nearest neighbor search, highdimensional datasets 1
Tensor product formulation for Hilbert spacefilling curves
 In Proceedings of the 2003 International Conference on Parallel Processing
, 2003
"... We present a tensor product formulation for Hilbert spacefilling curves. Both recursive and iterative formulas are expressed in the paper. We view a Hilbert spacefilling curve as a permutation which maps twodimensional ¥§¦©¨�¥� ¦ data elements stored in the row major or column major order to the ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
(Show Context)
We present a tensor product formulation for Hilbert spacefilling curves. Both recursive and iterative formulas are expressed in the paper. We view a Hilbert spacefilling curve as a permutation which maps twodimensional ¥§¦©¨�¥� ¦ data elements stored in the row major or column major order to the order of traversing a Hilbert spacefilling curve. The tensor product formula of Hilbert spacefilling curves uses several permutation operations: stride permutation, radix2 Gray permutation, transposition, and antidiagonal transposition. The iterative tensor product formula can be manipulated to obtain the inverse Hilbert permutation. Also, the formulas are directly translated into computer programs which can be used in various applications including Rtree indexing, image processing, and process llocation, etc. Key words: tensor product, block recursive algorithm, Hilbert spacefilling curve, stride
On Rtrees with low stabbing number
 IN PROC. ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS
, 2002
"... The Rtree is a wellknown boundingvolume hierarchy that is suitable for storing geometric data on secondary memory. Unfortunately, no good analysis of its query time exists. We describe a new algorithm to construct an Rtree for a set of planar objects that has provably good query complexity ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
The Rtree is a wellknown boundingvolume hierarchy that is suitable for storing geometric data on secondary memory. Unfortunately, no good analysis of its query time exists. We describe a new algorithm to construct an Rtree for a set of planar objects that has provably good query complexity for point location queries and range queries with ranges of small width. For certain important special cases, our bounds are optimal. We also show how to update the structure dynamically, and we generalize our results to higherdimensional spaces.
ObjectBased and ImageBased Object Representations
 ACM Computing Surveys
, 2004
"... An overview is presented of objectbased and imagebased representations of objects by their interiors. The representations are distinguished by the manner in which they can be used to answer two fundamental queries in database applications: (1) Feature query: given an object, determine its constitu ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
An overview is presented of objectbased and imagebased representations of objects by their interiors. The representations are distinguished by the manner in which they can be used to answer two fundamental queries in database applications: (1) Feature query: given an object, determine its constituent cells (i.e., their locations in space). (2) Location query: given a cell (i.e., a location in space), determine the identity of the object (or objects) of which it is a member as well as the remaining constituent cells of the object (or objects). Regardless of the representation that is used, the generation of responses to the feature and location queries is facilitated by building an index (i.e., the result of a sort) either on the objects or on their locations in space, and implementing it using an access structure that correlates the objects with the locations. Assuming the presence of an access structure, implicit (i.e., imagebased) representations are described that are good for finding the objects associated with a particular location or cell (i.e., the location query), while requiring that all cells be examined when determining the locations associated with a particular object (i.e., the feature query). In contrast, explicit (i.e., objectbased) representations are good for the feature query,
Algebraic formulation and program generation of threedimensional hilbert spacefilling curves
 In The 2004 International Conference on Imaging Science, Systems, and Technology
, 2004
"... Abstract: We use a tensor product based multilinear algebra theory to formulate threedimensional Hilbert spacefilling curves. A 3D Hilbert spacefilling curve is specified as a permutation which rearranges threedimensional 2 n 2 n 2 n data elements stored in the row major order as in C language ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract: We use a tensor product based multilinear algebra theory to formulate threedimensional Hilbert spacefilling curves. A 3D Hilbert spacefilling curve is specified as a permutation which rearranges threedimensional 2 n 2 n 2 n data elements stored in the row major order as in C language or the column major order as in FORTRAN language to the order of traversing a 3D Hilbert spacefilling curve. The tensor product formulation of 3D Hilbert spacefilling curves uses stride permutation, reverse permutation, and Gray permutation. We present both recursive and iterative tensor product formulas of 3D Hilbert spacefilling curves. In addition, we derive a tensor product formula of inverse 3D Hilbert spacefilling curve permutation. The tensor product formulas are directly translated into computer programs which can be used in various applications. The process of program generation is explained in the paper.