Results 1  10
of
34
Blind evaluation of nearest neighbor queries using space transformation to preserve location privacy
 In SSTD’07
"... Abstract. In this paper we propose a fundamental approach to perform the class of Nearest Neighbor (NN) queries, the core class of queries used in many of the locationbased services, without revealing the origin of the query in order to preserve the privacy of this information. The idea behind our ..."
Abstract

Cited by 77 (9 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we propose a fundamental approach to perform the class of Nearest Neighbor (NN) queries, the core class of queries used in many of the locationbased services, without revealing the origin of the query in order to preserve the privacy of this information. The idea behind our approach is to utilize oneway transformations to map the space of all static and dynamic objects to another space and resolve the query blindly in the transformed space. However, in order to become a viable approach, the transformation used should be able to resolve NN queries in the transformed space accurately and more importantly prevent malicious use of transformed data by untrusted entities. Traditional encryption based techniques incur expensive O(n) computation cost (where n is the total number of points in space) and possibly logarithmic communication cost for resolving a KNN query. This is because such approaches treat points as vectors in space and do not exploit their spatial properties. In contrast, we use Hilbert curves as ef cient oneway transformations and design algorithms to evaluate a KNN query in the Hilbert transformed space. Consequently, we reduce the complexity of computing a KNN query to O(K × 22N) and transferring the results n to the client in O(K), respectively, where N, the Hilbert curve degree, is a small constant. Our results show that we very closely approximate the result set generated from performing KNN queries in the original space while enforcing our new location privacy metrics termed uanonymity and aanonymity, which are stronger and more generalized privacy measures than the commonly used Kanonymity and cloaked region size measures. 1
XenoSearch: Distributed Resource Discovery in the XenoServer Open Platform
 In Proceedings of HPDC
, 2003
"... We describe the XenoSearch system for performing expressive resource discovery searches in a distributed environment. We represent server metadata, such as their locations and facilities, as points in a multidimensional space and then express queries as predicates over these points. Each XenoSearc ..."
Abstract

Cited by 56 (3 self)
 Add to MetaCart
(Show Context)
We describe the XenoSearch system for performing expressive resource discovery searches in a distributed environment. We represent server metadata, such as their locations and facilities, as points in a multidimensional space and then express queries as predicates over these points. Each XenoSearch node holds a portion of this space and the key goal of XenoSearch is to direct queries to those nodes containing the metadata of matching XenoServers. Communication between these XenoSearch nodes is based on the selforganizing Pastry peertopeer routing substrate. Our initial performance evaluation on a widearea prototype shows that queries are only a factor of 35 times longer than basic Pastry routing, while supporting multidimensional searches of arbitrary shapes.
New Challenges in Dynamic Load Balancing
 APPL. NUMER. MATH
, 2004
"... Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the loadbalancing problem is not yet solved completely; new applications and architectures requi ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
(Show Context)
Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the loadbalancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms must be enhanced to support more complex applications. New models are needed for nonsquare, nonsymmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for nonuniform computing, network, and memory resources. And, for greatest impact, these new capabilities must be delivered in toolkits that are robust, easytouse, and applicable to a wide range of applications. In this paper, we discuss our approaches to addressing these issues within the Zoltan Parallel Data Services toolkit.
Feature statistical retrieval applied to contentbased copy identification
 In International Conference on Image Processing
, 2004
"... In many image or video retrieval systems, the search of similar objects in the database includes a spatial access method to a multidimensional feature space. This step is generally considered as a problem independent of the features and the similarity type. The well known multidimensional nearest ne ..."
Abstract

Cited by 20 (10 self)
 Add to MetaCart
(Show Context)
In many image or video retrieval systems, the search of similar objects in the database includes a spatial access method to a multidimensional feature space. This step is generally considered as a problem independent of the features and the similarity type. The well known multidimensional nearest neighbor search was also widely studied by the database community as a generic method. In this paper, we propose a novel strategy dedicated to pseudoinvariant features retrieval and more specifically applied to contentbased copy identification. The range of a query is computed during the search according to deviation statistics between original and observed features. Furthermore, this approximate search range is directly mapped onto a Hilbert spacefilling curve allowing an efficient access to the database. Experimental results give excellent response times for very large databases both on synthetic and real data. This work is used in a TV monitoring system including more than 13000 hours of video in the reference database. 1.
Statistical similarity search applied to contentbased video copy detection
 in ICDE Workshops
, 2005
"... Abstract—Contentbased copy detection (CBCD) is one of the emerging multimedia applications for which there is a need of a concerted effort from the database community and the computer vision community. Recent methods based on interest points and local fingerprints have been proposed to perform robu ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Contentbased copy detection (CBCD) is one of the emerging multimedia applications for which there is a need of a concerted effort from the database community and the computer vision community. Recent methods based on interest points and local fingerprints have been proposed to perform robust CBCD of images and video. They include two steps: the search of similar fingerprints in the database and a voting strategy that merges all the local results in order to make a global decision. In most image or video retrieval systems, the search of similar features in the database is performed by a geometrical query in a multidimensional index structure. Recently, the paradigm of approximate knearest neighbors query has shown that trading quality for time can be widely profitable in that context. In this paper, we propose a new approximate search paradigm dedicated to local fingerprints and we describe the original indexing structure we have developped to compute efficiently the corresponding queries. We consider that the distribution of the relevant fingerprints around a query can be modeled by the distribution of the distorsion vector between a referenced fingerprint and a candidate one. Experimental results show that these statistical queries allow high performance gains compared to classical ǫrange queries. By studying the influence of this approximate search on a complete CBCD scheme based on local video fingerprints, we also show that trading quality for time during the search does not degrade seriously the global robustness of the system, even with very large databases including more than 10,000 hours of video. I.
Query Integrity Assurance of Locationbased Services Accessing Outsourced Spatial Databases
 In SSTD
, 2009
"... Abstract. Outsourcingdatatothirdparty data providers is becoming a common practice for data owners to avoid the cost of managing and maintaining databases. Meanwhile, due to the popularity of locationbasedservices (LBS), the need for spatial data (e.g., gazetteers, vector data) is increasing expone ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Outsourcingdatatothirdparty data providers is becoming a common practice for data owners to avoid the cost of managing and maintaining databases. Meanwhile, due to the popularity of locationbasedservices (LBS), the need for spatial data (e.g., gazetteers, vector data) is increasing exponentially. Consequently, we are witnessing a new trend of outsourcing spatial datasets by data collectors. Two main challenges with outsourcing datasets is to keep the data private (from the data provider) and ensure the integrity of the query result (for the clients). Unfortunately, most of the techniques proposed for privacy and integrity do not extend to spatial data in a straightforward manner. Hence, recent studies proposed various techniques to support either privacy or integrity (but not both) on spatial datasets. In this paper, for the first time, we propose a technique that can ensure both privacy and integrity for outsourced spatial data. In particular, we first use a oneway spatial transformation method based on Hilbert curves, which encrypts the spatial data before outsourcing and hence ensures its privacy. Next, by probabilistically replicating a portion of the data and encrypting it with a different encryption key, we devise a technique for the client to audit the trustworthiness of the query results. We show the applicability of our approach for both knearestneighbor and spatial range queries, the building blocks of any LBS application. Finally, we evaluate the validity and performance of our algorithms with realworld datasets. 1
Neighborfinding based on spacefilling curves
 Information Systems
, 2005
"... Nearest neighborfinding is one of the most important spatial operations in the field of spatial data structures concerned with proximity. Because the goal of the spacefilling curves is to preserve the spatial proximity, the nearest neighbor queries can be handled by these spacefilling curves. Whe ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Nearest neighborfinding is one of the most important spatial operations in the field of spatial data structures concerned with proximity. Because the goal of the spacefilling curves is to preserve the spatial proximity, the nearest neighbor queries can be handled by these spacefilling curves. When data is ordered by the Peano curve, we can directly compute the sequence numbers of the neighboring blocks next to the query block in 8 directions in the 2Dspace based on its bit shuffling property. But when data is ordered by the RBG curve or the Hilbert curve, neighborfinding is complex. However, we observe that there is some relationship between the RBG curve and the Peano curve, as with the Hilbert curve. Therefore, in this paper, we first show the strategy based on the Peano curve for the nearest neighbor query. Next, we present the rules for transformation between the Peano curve and the other two curves, including the RBG curve and the Hilbert curve, such that we can also efficiently find the nearest neighbor by the strategies based on these two curves. From our simulation, we show that the strategy based on the Hilbert curve requires the least total time (the CPUtime and the I/O time) to process the nearest neighbor query among our three strategies, since it can provide the good clustering property.
An indexing framework for efficient retrieval on the cloud
 IEEE Data Engineering Bulletin
"... The emergence of the Cloud system has simplified the deployment of largescale distributed systems for software vendors. The Cloud system provides a simple and unified interface between vendor and user, allowing vendors to focus more on the software itself rather than the underlying framework. Exist ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
The emergence of the Cloud system has simplified the deployment of largescale distributed systems for software vendors. The Cloud system provides a simple and unified interface between vendor and user, allowing vendors to focus more on the software itself rather than the underlying framework. Existing Cloud systems seek to improve performance by increasing parallelism. In this paper, we explore an alternative solution, proposing an indexing framework for the Cloud system based on the structured overlay. Our indexing framework reduces the amount of data transferred inside the Cloud and facilitates the deployment of database backend applications. 1
Compact Hilbert Indices: Spacefilling curves for domains with unequal side lengths
, 2007
"... In this paper we define a new compact Hilbert index which, while maintaining all of the advantages of the standard Hilbert curve, permits spaces with unequal dimension cardinalities. The compact Hilbert index can be used in any application that would have previously relied on Hilbert curves but, in ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this paper we define a new compact Hilbert index which, while maintaining all of the advantages of the standard Hilbert curve, permits spaces with unequal dimension cardinalities. The compact Hilbert index can be used in any application that would have previously relied on Hilbert curves but, in the case of unequal side lengths, provides a more memory efficient representation. This advantage is particularly important in distributed applications (Parallel, P2P and Grid), in which not only is memory space saved but communication volume is significantly reduced.
Spatial Ordering and Encoding for Geographic Data Mining and Visualization
"... Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining to take into account both the geographic information and multiple aspatial variab ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Geographic information (e.g., locations, networks, and nearest neighbors) are unique and different from other aspatial attributes (e.g., population, sales, or income). It is a challenging problem in spatial data mining to take into account both the geographic information and multiple aspatial variables in the detection of patterns. To tackle this problem, we present and evaluate a variety of spatial ordering methods that can transform spatial relations into a onedimensional ordering and encoding which preserves spatial locality as much possible. The ordering can then be used to spatially sort temporal or multivariate data series and thus help reveal patterns across different geographical spaces. The encoding, as a materialization of spatial clusters and neighboring relations, is then amenable for processing together with aspatial variables by any existing (nonspatial) data mining methods. We design a set of measures to evaluate nine different ordering/encoding methods, including two spacefilling curves, six hierarchical clustering based methods, and a onedimensional Sammon mapping (a multidimensional scaling approach). Evaluation results with various data distributions show that the optimal ordering/encoding with the completelinkage clustering consistently gives the best overall performance, surpassing wellknown spacefilling curves in preserving