Results 1 - 10
of
13
XenoSearch: Distributed Resource Discovery in the XenoServer Open Platform
- In Proceedings of HPDC
, 2003
"... We describe the XenoSearch system for performing expressive resource discovery searches in a distributed environment. We represent server meta-data, such as their locations and facilities, as points in a multi-dimensional space and then express queries as predicates over these points. Each XenoSearc ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
We describe the XenoSearch system for performing expressive resource discovery searches in a distributed environment. We represent server meta-data, such as their locations and facilities, as points in a multi-dimensional space and then express queries as predicates over these points. Each XenoSearch node holds a portion of this space and the key goal of XenoSearch is to direct queries to those nodes containing the meta-data of matching XenoServers. Communication between these XenoSearch nodes is based on the self-organizing Pastry peer-to-peer routing substrate. Our initial performance evaluation on a wide-area prototype shows that queries are only a factor of 3-5 times longer than basic Pastry routing, while supporting multi-dimensional searches of arbitrary shapes.
Blind evaluation of nearest neighbor queries using space transformation to preserve location privacy
- In SSTD’07
"... Abstract. In this paper we propose a fundamental approach to perform the class of Nearest Neighbor (NN) queries, the core class of queries used in many of the location-based services, without revealing the origin of the query in order to preserve the privacy of this information. The idea behind our ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Abstract. In this paper we propose a fundamental approach to perform the class of Nearest Neighbor (NN) queries, the core class of queries used in many of the location-based services, without revealing the origin of the query in order to preserve the privacy of this information. The idea behind our approach is to utilize one-way transformations to map the space of all static and dynamic objects to another space and resolve the query blindly in the transformed space. However, in order to become a viable approach, the transformation used should be able to resolve NN queries in the transformed space accurately and more importantly prevent malicious use of transformed data by untrusted entities. Traditional encryption based techniques incur expensive O(n) computation cost (where n is the total number of points in space) and possibly logarithmic communication cost for resolving a KNN query. This is because such approaches treat points as vectors in space and do not exploit their spatial properties. In contrast, we use Hilbert curves as ef cient one-way transformations and design algorithms to evaluate a KNN query in the Hilbert transformed space. Consequently, we reduce the complexity of computing a KNN query to O(K × 22N) and transferring the results n to the client in O(K), respectively, where N, the Hilbert curve degree, is a small constant. Our results show that we very closely approximate the result set generated from performing KNN queries in the original space while enforcing our new location privacy metrics termed u-anonymity and a-anonymity, which are stronger and more generalized privacy measures than the commonly used K-anonymity and cloaked region size measures. 1
New Challenges in Dynamic Load Balancing
- APPL. NUMER. MATH
, 2004
"... Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures requi ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms must be enhanced to support more complex applications. New models are needed for non-square, non-symmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for non-uniform computing, network, and memory resources. And, for greatest impact, these new capabilities must be delivered in toolkits that are robust, easy-to-use, and applicable to a wide range of applications. In this paper, we discuss our approaches to addressing these issues within the Zoltan Parallel Data Services toolkit.
Feature statistical retrieval applied to content-based copy identification
- In International Conference on Image Processing
, 2004
"... In many image or video retrieval systems, the search of similar objects in the database includes a spatial access method to a multidimensional feature space. This step is generally considered as a problem independent of the features and the similarity type. The well known multidimensional nearest ne ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
In many image or video retrieval systems, the search of similar objects in the database includes a spatial access method to a multidimensional feature space. This step is generally considered as a problem independent of the features and the similarity type. The well known multidimensional nearest neighbor search was also widely studied by the database community as a generic method. In this paper, we propose a novel strategy dedicated to pseudo-invariant features retrieval and more specifically applied to content-based copy identification. The range of a query is computed during the search according to deviation statistics between original and observed features. Furthermore, this approximate search range is directly mapped onto a Hilbert space-filling curve allowing an efficient access to the database. Experimental results give excellent response times for very large databases both on synthetic and real data. This work is used in a TV monitoring system including more than 13000 hours of video in the reference database. 1.
Statistical similarity search applied to content-based video copy detection
- in ICDE Workshops
, 2005
"... Abstract—Content-based copy detection (CBCD) is one of the emerging multimedia applications for which there is a need of a concerted effort from the database community and the computer vision community. Recent methods based on interest points and local fingerprints have been proposed to perform robu ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract—Content-based copy detection (CBCD) is one of the emerging multimedia applications for which there is a need of a concerted effort from the database community and the computer vision community. Recent methods based on interest points and local fingerprints have been proposed to perform robust CBCD of images and video. They include two steps: the search of similar fingerprints in the database and a voting strategy that merges all the local results in order to make a global decision. In most image or video retrieval systems, the search of similar features in the database is performed by a geometrical query in a multidimensional index structure. Recently, the paradigm of approximate knearest neighbors query has shown that trading quality for time can be widely profitable in that context. In this paper, we propose a new approximate search paradigm dedicated to local fingerprints and we describe the original indexing structure we have developped to compute efficiently the corresponding queries. We consider that the distribution of the relevant fingerprints around a query can be modeled by the distribution of the distorsion vector between a referenced fingerprint and a candidate one. Experimental results show that these statistical queries allow high performance gains compared to classical ǫ-range queries. By studying the influence of this approximate search on a complete CBCD scheme based on local video fingerprints, we also show that trading quality for time during the search does not degrade seriously the global robustness of the system, even with very large databases including more than 10,000 hours of video. I.
Neighbor-finding based on space-filling curves
- Information Systems
, 2005
"... Nearest neighbor-finding is one of the most important spatial operations in the field of spatial data structures concerned with proximity. Because the goal of the space-filling curves is to preserve the spatial proximity, the nearest neighbor queries can be handled by these space-filling curves. Whe ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Nearest neighbor-finding is one of the most important spatial operations in the field of spatial data structures concerned with proximity. Because the goal of the space-filling curves is to preserve the spatial proximity, the nearest neighbor queries can be handled by these space-filling curves. When data is ordered by the Peano curve, we can directly compute the sequence numbers of the neighboring blocks next to the query block in 8 directions in the 2D-space based on its bit shuffling property. But when data is ordered by the RBG curve or the Hilbert curve, neighbor-finding is complex. However, we observe that there is some relationship between the RBG curve and the Peano curve, as with the Hilbert curve. Therefore, in this paper, we first show the strategy based on the Peano curve for the nearest neighbor query. Next, we present the rules for transformation between the Peano curve and the other two curves, including the RBG curve and the Hilbert curve, such that we can also efficiently find the nearest neighbor by the strategies based on these two curves. From our simulation, we show that the strategy based on the Hilbert curve requires the least total time (the CPU-time and the I/O time) to process the nearest neighbor query among our three strategies, since it can provide the good clustering property.
Compact Hilbert indices
, 2006
"... Space-filling curves are continuous self-similar functions which map compact multi-dimensional sets into one-dimensional ones. Since their invention they have found applications in a wide variety of fields [12, 21]. In the context of scientific computing and database systems, spacefilling curves can ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Space-filling curves are continuous self-similar functions which map compact multi-dimensional sets into one-dimensional ones. Since their invention they have found applications in a wide variety of fields [12, 21]. In the context of scientific computing and database systems, spacefilling curves can significantly improve data reuse and request times because of their locality properties [9, 13, 15]. In particular, the Hilbert curve has been shown to be the best choice for these applications [21]. However, in database systems it is often the case that not all dimensions of the data have the same cardinality, leading to an inefficiency in the use of space-filling curves due to their being naturally constrained to spaces where all dimensions are of equal size. We explore the Hilbert curve, reproducing classical algorithms for their generation and manipulation through an intuitive and rigorous geometric approach. We then extend these basic results to construct compact Hilbert indices which are able to capture the ordering properties of the regular Hilbert curve but without the associated inefficiency in representation for spaces with mismatched dimensions.
Compact Hilbert Indices: Space-filling curves for domains with unequal side lengths
, 2007
"... In this paper we define a new compact Hilbert index which, while maintaining all of the advantages of the standard Hilbert curve, permits spaces with unequal dimension cardinalities. The compact Hilbert index can be used in any application that would have previously relied on Hilbert curves but, in ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we define a new compact Hilbert index which, while maintaining all of the advantages of the standard Hilbert curve, permits spaces with unequal dimension cardinalities. The compact Hilbert index can be used in any application that would have previously relied on Hilbert curves but, in the case of unequal side lengths, provides a more memory efficient representation. This advantage is particularly important in distributed applications (Parallel, P2P and Grid), in which not only is memory space saved but communication volume is significantly reduced.
SOG: A SELF-ORGANIZED GROUPING INFRASTRUCTURE FOR GRID RESOURCE DISCOVERY
, 2006
"... Dynamic and heterogeneous characteristics of large-scale Grids make the fun-damental problem of resource discovery a great challenge. This thesis presents a self-organized grouping (SOG) infrastructure that achieves efficient Grid resource discovery by forming and maintaining autonomous resource gro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Dynamic and heterogeneous characteristics of large-scale Grids make the fun-damental problem of resource discovery a great challenge. This thesis presents a self-organized grouping (SOG) infrastructure that achieves efficient Grid resource discovery by forming and maintaining autonomous resource groups. Each group dy-namically aggregates a set of resources that are similar to each other in some pre-specified resource characteristic. The SOG method takes advantage of the strengths of both centralized and decentralized approaches that were previously developed for Grid/P2P resource discovery. The design of the SOG method minimizes the overhead incurred in forming and maintaining groups and maximizes resource discovery perfor-mance. The way SOG method handles resource discovery queries is metaphorically similar to searching for a word in an English dictionary by identifying its alphabetical groups at the first place and then performing a lexical search within the group. The algorithms implemented in SOG method are illustrated with details. This thesis also illustrates a generalized approach using a space-filling curve
An indexing framework for efficient retrieval on the cloud
- IEEE Data Engineering Bulletin
"... The emergence of the Cloud system has simplified the deployment of large-scale distributed systems for software vendors. The Cloud system provides a simple and unified interface between vendor and user, allowing vendors to focus more on the software itself rather than the underlying framework. Exist ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The emergence of the Cloud system has simplified the deployment of large-scale distributed systems for software vendors. The Cloud system provides a simple and unified interface between vendor and user, allowing vendors to focus more on the software itself rather than the underlying framework. Existing Cloud systems seek to improve performance by increasing parallelism. In this paper, we explore an alternative solution, proposing an indexing framework for the Cloud system based on the structured overlay. Our indexing framework reduces the amount of data transferred inside the Cloud and facilitates the deployment of database back-end applications. 1

