Results 11  20
of
185
Using Spacefilling Curves for Multidimensional Indexing
 Lecture Notes in Computer Science
, 2000
"... . This paper presents and discusses a radically different approach to multidimensional indexing based on the concept of the spacefilling curve. It reports the novel algorithms which had to be developed to create the first actual implementation of a system based on this approach, on some compara ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
. This paper presents and discusses a radically different approach to multidimensional indexing based on the concept of the spacefilling curve. It reports the novel algorithms which had to be developed to create the first actual implementation of a system based on this approach, on some comparative performance tests, and on its actual use within the TriStarp Group at Birkbeck to provide a Triple Store repository. An important result that goes beyond this requirement, however, is that the performance improvement over the Grid File is greater the higher the dimension. 1 Introduction Underlying any dbms is some form of repository management system or data store. The classic and dominant model for such repositories is that of some form of logical record or data aggregate type with a collection of instances conforming to that type usually termed a file. Such file systems are, of course, also used directly in many applications. The data model of a dbms may be radically different f...
Outlier mining in large highdimensional data sets
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... In this paper a new definition of distancebased outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and highdimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
(Show Context)
In this paper a new definition of distancebased outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and highdimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearestneighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of spacefilling curve to linearize the data set, and it consists of two phases. The first phase provides an approximate solution, within a rough factor, after the execution of at most d + 1 sorts and scans of the data set, with temporal cost quadratic in d and linear in N and in k, where d is the number of dimensions of the data set and N is the number of points in the data set. During this phase, the algorithm isolates points candidate to be outliers and reduces this set at each iteration. If the size of this set becomes n, then the algorithm stops reporting the exact solution. The second phase calculates the exact solution with a final scan examining further the candidate outliers remained after the first phase. Experimental results show that the algorithm always stops, reporting the exact solution, during the first phase after much less than d + 1 steps. We present both an inmemory and diskbased implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases.
Range nearestneighbor query
 IEEE Transactions on Knowledge and Data Engineering (TKDE
"... A range nearestneighbor (RNN) query retrieves the nearest neighbor (NN) for every point in a range. It is a natural generalization of point and continuous nearestneighbor queries and has many applications. In this paper, we consider the ranges as (hyper)rectangles and propose efficient inmemory ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
(Show Context)
A range nearestneighbor (RNN) query retrieves the nearest neighbor (NN) for every point in a range. It is a natural generalization of point and continuous nearestneighbor queries and has many applications. In this paper, we consider the ranges as (hyper)rectangles and propose efficient inmemory processing and secondary memory pruning techniques for RNN queries in both 2D and highdimensional spaces. These techniques are generalized for kRNN queries, which return the k nearest neighbors for every point in the range. In addition, we devise an auxiliary solutionbased index EXOtree to speed up any type of NN query. EXOtree is orthogonal to any existing NN processing algorithm and thus can be transparently integrated. An extensive empirical study was conducted to evaluate the CPU and I/O performance of these techniques, and the study showed that they are efficient and robust under various datasets, query ranges, numbers of nearest neighbors, dimensions, and cache sizes.
Scalability analysis of declustering methods for multidimensional range queries
 IEEE Trans. on Knowledge and Data Eng
, 1998
"... Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many dataintensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partialmatch and bestmatch queries. Several heuristic ..."
Abstract

Cited by 38 (20 self)
 Add to MetaCart
(Show Context)
Abstract—Efficient storage and retrieval of multiattribute data sets has become one of the essential requirements for many dataintensive applications. The Cartesian product file has been known as an effective multiattribute file structure for partialmatch and bestmatch queries. Several heuristic methods have been developed to decluster Cartesian product files across multiple disks to obtain high performance for disk accesses. Although the scalability of the declustering methods becomes increasingly important for systems equipped with a large number of disks, no analytic studies have been done so far. In this paper, we derive formulas describing the scalability of two popular declustering methods¦Disk Modulo and Fieldwise Xor¦for range queries, which are the most common type of queries. These formulas disclose the limited scalability of the declustering methods, and this is corroborated by extensive simulation experiments. From the practical point of view, the formulas given in this paper provide a simple measure that can be used to predict the response time of a given range query and to guide the selection of a declustering method under various conditions.
Querying multidimensional data indexed using the hilbert spacefilling curve
 SIGMOD Record
, 2001
"... Mapping to onedimensional values and then using a onedimensional indexing method has been proposed as a way of indexing multidimensional data. Most previous related work uses the ZOrder Curve but more recently the Hilbert Curve has been considered since it has superior clustering properties. Any ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
Mapping to onedimensional values and then using a onedimensional indexing method has been proposed as a way of indexing multidimensional data. Most previous related work uses the ZOrder Curve but more recently the Hilbert Curve has been considered since it has superior clustering properties. Any approach, however, can only be of practical value if there are e ective methods for executing range and partial match queries. This paper describes such amethod for the Hilbert Curve. 1
Comet: A scalable coordination space for decentralized distributed environments
 In Proceedings of the 2nd International Workshop on Hot Topics in PeertoPeer Systems (HOTP2P 2005
, 2005
"... The scale, heterogeneity, and dynamism of emerging distributed and decentralized environments make coordination a significant and challenging problem. In this paper we present Comet, a scalable peertopeer contentbased coordination space. Comet provides a global virtual sharedspace that can be ass ..."
Abstract

Cited by 31 (8 self)
 Add to MetaCart
(Show Context)
The scale, heterogeneity, and dynamism of emerging distributed and decentralized environments make coordination a significant and challenging problem. In this paper we present Comet, a scalable peertopeer contentbased coordination space. Comet provides a global virtual sharedspace that can be associatively accessed by all peer nodes in the system, and access is independent of the physical location of the tuples or identifiers of the host. Dynamically constructed, transient coordination spaces are also supported to enable context locality to be explicitly exploited for improving system performance. The design, implementation, and experimental evaluation of Comet using the PlanetLab platform are presented. 1
Parallel Domain Decomposition and Load Balancing Using SpaceFilling Curves
 in Proceedings of the 4th IEEE Conference on High Performance Computing
, 1997
"... Partitioning techniques based on spacefilling curves have received much recent attention due to their low running time and good load balance characteristics. The basic idea underlying these methods is to order the multidimensional data according to a spacefilling curve and partition the resulting ..."
Abstract

Cited by 28 (2 self)
 Add to MetaCart
(Show Context)
Partitioning techniques based on spacefilling curves have received much recent attention due to their low running time and good load balance characteristics. The basic idea underlying these methods is to order the multidimensional data according to a spacefilling curve and partition the resulting onedimensional order. However, spacefilling curves are defined for points that lie on a uniform grid of a particular resolution. It is typically assumed that the coordinates of the points are representable using a fixed number of bits, and the runtimes of the algorithms depend upon the number of bits used. In this paper, we present a simple and efficient technique for ordering arbitrary and dynamic multidimensional data using spacefilling curves and its application to parallel domain decomposition and load balancing. Our technique is based on a comparison routine that determines the relative position of two points in the order induced by a spacefilling curve. The comparison routine could then be used...
Querying the trajectories of online mobile objects
 In MobiDE 2001
, 2001
"... Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internetworked, positionaware mobile ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internetworked, positionaware mobile objects that receive services where the objects ’ position is essential. The movement of an object is captured via sampling, resulting in a trajectory consisting of a sequence of connected line segments for each moving object. This paper presents a technique for querying these trajectories. The technique uses indices for the processing of spatiotemporal range queries on trajectories. If object movement is constrained by the presence of infrastructure, e.g., lakes, park areas, etc., the technique is capable of exploiting this to reduce the range query, the purpose being to obtain better query performance. Specifically, an algorithm is proposed that segments the original range query based on the infrastructure contained in its range. The applicability and limitations of the proposal are assessed via empirical performance studies with varying datasets and parameter settings.
Nearest Neighbor Search with Strong Location Privacy
, 2010
"... The tremendous growth of the Internet has significantly reduced the cost of obtaining and sharing information about individuals, raising many concerns about user privacy. Spatial queries pose an additional threat to privacy because the location of a query may be sufficient to reveal sensitive inform ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
The tremendous growth of the Internet has significantly reduced the cost of obtaining and sharing information about individuals, raising many concerns about user privacy. Spatial queries pose an additional threat to privacy because the location of a query may be sufficient to reveal sensitive information about the querier. In this paper we focus on k nearest neighbor (kNN) queries and define the notion of strong location privacy, which renders a query indistinguishable from any location in the data space. We argue that previous work fails to support this property for arbitrary kNN search. Towards this end, we introduce methods that offer strong location privacy, by integrating private information retrieval (PIR) functionality. Specifically, we employ secure hardwareaided PIR, which has been proven very efficient and is currently considered as a practical mechanism for PIR.Initially, wedevise a benchmark solution building upon an existing PIRbased technique. Subsequently, we identify its drawbacks and present a novel scheme called AHG to tackle them. Finally, we demonstrate the performance superiority of AHG over our competitor, and its viability in applications demanding the highest level of privacy.
The Virtual Microscope
 IEEE Transactions on Information Technology in Biomedicine
, 2002
"... We present the design and implementation of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. The system provides a form of completely digital telepathology, allowing simultaneous access to archived dig ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
We present the design and implementation of the Virtual Microscope, a software system employing a client/server architecture to provide a realistic emulation of a high power light microscope. The system provides a form of completely digital telepathology, allowing simultaneous access to archived digital slide images by multiple clients. The main problem the system targets is storing and processing the extremely large quantities of data required to represent a collection of slides. The Virtual Microscope client software runs on the end user's PC or workstation, while database software for storing, retrieving and processing the microscope image data runs on a parallel computer or on a set of workstations at one or more potentially remote sites. We have designed and implemented two versions of the data server software. One implementation is a customization of a database system framework that is optimized for a tightly coupled parallel machine with attached local disks. The second implementation is componentbased, and has been designed to accommodate access to and processing of data in a distributed, heterogeneous environment. We also have developed caching client software, implemented in Java, to achieve good response time and portability across different computer platforms. The performance results presented show that the Virtual Microscope systems scales well, so that many clients can be adequately serviced by an appropriately configured data server.