PRIVE: Anonymous LocationBased Queries in Distributed Mobile Systems
 In WWW ’07: Proceedings of the 16th International Conference on World Wide Web
, 2007
Cited by 93 (5 self)
tutorial article, which has been submitted for publication in a journal or for consideration by the commissioning organization. The report represents the ideas of its author, and should not be taken as the official views of the School or the University. Any discussion of the content of the report should be sent to the author, at the address shown on the cover.
Network coordinates in the wild
 In Proceeding of USENIX NSDI’07
, 2007
Cited by 80 (2 self)
Network coordinates provide a mechanism for selecting and placing servers efficiently in a large distributed system. This approach works well as long as the coordinates continue to accurately reflect network topology. We conducted a longterm study of a subset of a millionplus node coordinate system and found that it exhibited some of the problems for which network coordinates are frequently criticized, for example, inaccuracy and fragility in the presence of violations of the triangle inequality. Fortunately, we show that several simple techniques remedy many of these problems. Using the Azureus BitTorrent network as our testbed, we show that live, largescale network coordinate systems behave differently than their tame PlanetLab and simulationbased counterparts. We find higher relative errors, more triangle inequality violations, and higher churn. We present and evaluate a number of techniques that, when applied to Azureus, efficiently produce accurate and stable network coordinates. 1
Using Spacefilling Curves for Multidimensional Indexing
 Lecture Notes in Computer Science
, 2000
Cited by 48 (1 self)
. This paper presents and discusses a radically different approach to multidimensional indexing based on the concept of the spacefilling curve. It reports the novel algorithms which had to be developed to create the first actual implementation of a system based on this approach, on some comparative performance tests, and on its actual use within the TriStarp Group at Birkbeck to provide a Triple Store repository. An important result that goes beyond this requirement, however, is that the performance improvement over the Grid File is greater the higher the dimension. 1 Introduction Underlying any dbms is some form of repository management system or data store. The classic and dominant model for such repositories is that of some form of logical record or data aggregate type with a collection of instances conforming to that type usually termed a file. Such file systems are, of course, also used directly in many applications. The data model of a dbms may be radically different f...
Outlier mining in large highdimensional data sets
 IEEE Transactions on Knowledge and Data Engineering
, 2005
Cited by 44 (3 self)
In this paper a new definition of distancebased outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and highdimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearestneighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of spacefilling curve to linearize the data set, and it consists of two phases. The first phase provides an approximate solution, within a rough factor, after the execution of at most d + 1 sorts and scans of the data set, with temporal cost quadratic in d and linear in N and in k, where d is the number of dimensions of the data set and N is the number of points in the data set. During this phase, the algorithm isolates points candidate to be outliers and reduces this set at each iteration. If the size of this set becomes n, then the algorithm stops reporting the exact solution. The second phase calculates the exact solution with a final scan examining further the candidate outliers remained after the first phase. Experimental results show that the algorithm always stops, reporting the exact solution, during the first phase after much less than d + 1 steps. We present both an inmemory and diskbased implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases.
Efficient Processing of Topk Dominating Queries on MultiDimensional Data
, 2007
Cited by 40 (2 self)
The topk dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of topk and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, topk dominating queries have not received adequate attention from the research community. In this paper, we design specialized algorithms that apply on indexed multidimensional data and fully exploit the characteristics of the problem. Experiments on synthetic datasets demonstrate that our algorithms significantly outperform a previous skylinebased approach, while our results on real datasets show the meaningfulness of topk dominating queries.
Querying multidimensional data indexed using the hilbert spacefilling curve
 SIGMOD Record
, 2001
Cited by 36 (0 self)
Mapping to onedimensional values and then using a onedimensional indexing method has been proposed as a way of indexing multidimensional data. Most previous related work uses the ZOrder Curve but more recently the Hilbert Curve has been considered since it has superior clustering properties. Any approach, however, can only be of practical value if there are e ective methods for executing range and partial match queries. This paper describes such amethod for the Hilbert Curve. 1
New Challenges in Dynamic Load Balancing
 APPL. NUMER. MATH
, 2004
Cited by 33 (6 self)
Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the loadbalancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms must be enhanced to support more complex applications. New models are needed for nonsquare, nonsymmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for nonuniform computing, network, and memory resources. And, for greatest impact, these new capabilities must be delivered in toolkits that are robust, easytouse, and applicable to a wide range of applications. In this paper, we discuss our approaches to addressing these issues within the Zoltan Parallel Data Services toolkit.
Complexity Reduction Methods for Fractal Image Compression
 IN: I.M.A. CONF. PROC. ON IMAGE PROCESSING; MATHEMATICAL METHODS AND APPLICATIONS
, 1994
Cited by 30 (7 self)
Fractal image compression allows fast decoding but suffers from long encoding times. During the encoding a large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find a best match for another image portion called range. In this article we review and extend the methods that have been developed to reduce the time complexity of this searching. Also we present a new taxonomy of the methods, provide an evaluation and propose two new techniques.
Low Latency Photon Mapping Using Block Hashing
 IN PROCEEDINGS OF THE CONFERENCE ON GRAPHICS HARDWARE 2002
, 2002
Cited by 24 (1 self)
Photon mapping is useful in the acceleration of global illumination and caustic effects computed by path tracing. For hardware accelerated rendering, photon maps would be especially useful for simulating caustic lighting effects on nonLambertian surfaces. For this to be possible, an efficient hardware algorithm for the computation of the k nearest neighbours to a sample point is required. Existing
MultiDimensional Topk Dominating Queries
Cited by 23 (1 self)
The topk dominating query returns k data objects which dominate the highest number of objects in a dataset. This query is an important tool for decision support since it provides data analysts an intuitive way for finding significant objects. In addition, it combines the advantages of topk and skyline queries without sharing their disadvantages: (i) the output size can be controlled, (ii) no ranking functions need to be specified by users, and (iii) the result is independent of the scales at different dimensions. Despite their importance, topk dominating queries have not received adequate attention from the research community. This paper is an extensive study on the evaluation of topk dominating queries. First, we propose a set of algorithms that apply on indexed multidimensional data. Second, we investigate query evaluation on data that are not indexed. Finally, we study a relaxed variant of the query which considers dominance in dimensional subspaces. Experiments using synthetic and real datasets demonstrate that our algorithms significantly outperform a previous skylinebased approach. We also illustrate the applicability of this multidimensional analysis query by studying the meaningfulness of its results on real data.