Results 1 - 10
of
27
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynami ..."
Abstract
-
Cited by 78 (34 self)
- Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worst-case efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Towards Optimal Locality in Mesh-Indexings
, 1997
"... The efficiency of many data structures and algorithms relies on "locality-preserving" indexing schemes for meshes. We concentrate on the case in which the maximal distance between two mesh nodes indexed i and j shall be a slow-growing function of ji jj. We present a new 2-D indexing scheme we call H ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
The efficiency of many data structures and algorithms relies on "locality-preserving" indexing schemes for meshes. We concentrate on the case in which the maximal distance between two mesh nodes indexed i and j shall be a slow-growing function of ji jj. We present a new 2-D indexing scheme we call H-indexing , which has superior (possibly optimal) locality in comparison with the well-known Hilbert indexings. H-indexings form a Hamiltonian cycle and we prove that they are optimally locality-preserving among all cyclic indexings. We provide fairly tight lower bounds for indexings without any restriction. Finally, illustrated by investigations concerning 2-D and 3-D Hilbert indexings, we present a framework for mechanizing upper bound proofs for locality.
Parallel Domain Decomposition and Load Balancing Using Space-Filling Curves
- in Proceedings of the 4th IEEE Conference on High Performance Computing
, 1997
"... Partitioning techniques based on space-filling curves have received much recent attention due to their low running time and good load balance characteristics. The basic idea underlying these methods is to order the multidimensional data according to a space-filling curve and partition the resulting ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Partitioning techniques based on space-filling curves have received much recent attention due to their low running time and good load balance characteristics. The basic idea underlying these methods is to order the multidimensional data according to a space-filling curve and partition the resulting onedimensional order. However, space-filling curves are defined for points that lie on a uniform grid of a particular resolution. It is typically assumed that the coordinates of the points are representable using a fixed number of bits, and the run-times of the algorithms depend upon the number of bits used. In this paper, we present a simple and efficient technique for ordering arbitrary and dynamic multidimensional data using space-filling curves and its application to parallel domain decomposition and load balancing. Our technique is based on a comparison routine that determines the relative position of two points in the order induced by a space-filling curve. The comparison routine could then be used...
High Dimensional Similarity Search With Space Filling Curves
- In Proceedings of the 17th International Conference on Data Engineering
, 2000
"... We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L t -metric, t = 1,2,3,... The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d + 1) B-tr ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We present a new approach for approximate nearest neighbor queries for sets of high dimensional points under any L t -metric, t = 1,2,3,... The proposed algorithm is efficient and simple to implement. The algorithm uses multiple shifted copies of the data points and stores them in up to (d + 1) B-trees where d is the dimensionality of the data, sorted according to their position along a space filling curve. This is done in a way that allows us to guarantee that a neighbor within an O(d^(1+1/t)) factor of the exact nearest, can be returned with at most (d + 1) log p n page accesses, where p is the branching factor of the B-trees. In practice, for real data sets, our approximate technique finds the exact nearest neighbor between 87% and 99% of the time and a point no farther than the third nearest neighbor between 98% and 100% of the time. Our solution is dynamic, allowing insertion or deletion of points in O(d log p n) page accesses and generalizes easily to find approximate k-nea...
On Multi-Dimensional Hilbert Indexings
- Theory of Computing Systems
, 1998
"... Indexing schemes for grids based on space-filling curves (e.g., Hilbert indexings) find applications in numerous fields, ranging from parallel processing over data structures to image processing. Because of an increasing interest in discrete multi-dimensional spaces, indexing schemes for them hav ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Indexing schemes for grids based on space-filling curves (e.g., Hilbert indexings) find applications in numerous fields, ranging from parallel processing over data structures to image processing. Because of an increasing interest in discrete multi-dimensional spaces, indexing schemes for them have won considerable interest. Hilbert curves are the most simple and popular space-filling indexing scheme. We extend the concept of curves with Hilbert property to arbitrary dimensions and present first results concerning their structural analysis that also simplify their applicability. We define and analyze in a precise mathematical way r-dimensional Hilbert indexings for arbitrary r 2. Moreover, we generalize and simplify previous work and clarify the concept of Hilbert curves for multi-dimensional grids. As we show, Hilbert indexings can be completely described and analyzed by "generating elements of order 1", thus, in comparison with previous work, reducing their structural comp...
Incremental Constructions con BRIO
, 2003
"... Randomized incremental constructions are widely used in computational geometry, but they perform very badly on large data because of their inherently random memory access patterns. We define a biased randomized insertion order which removes enough randomness to significantly improve performance, but ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Randomized incremental constructions are widely used in computational geometry, but they perform very badly on large data because of their inherently random memory access patterns. We define a biased randomized insertion order which removes enough randomness to significantly improve performance, but leaves enough randomness so that the algorithms remain theoretically optimal.
On Multidimensional Curves with Hilbert Property
, 2000
"... Indexing schemes for grids based on space-filling curves (e.g., Hilbert curves) find applications in numerous fields, ranging from parallel processing over data structures to image processing. Because of an increasing interest in discrete multidimensional spaces, indexing schemes for them have won c ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Indexing schemes for grids based on space-filling curves (e.g., Hilbert curves) find applications in numerous fields, ranging from parallel processing over data structures to image processing. Because of an increasing interest in discrete multidimensional spaces, indexing schemes for them have won considerable interest. Hilbert curves are the most simple and popular space-filling indexing schemes. We extend the concept of curves with Hilbert property to arbitrary dimensions and present first results concerning their structural analysis that also simplify their applicability.
Hybrid overlay structure based on random walks
- In Proc. of the 4th Intl. Workshop on Peer-toPeer Systems (IPTPS’05
, 2005
"... Application-level multicast on structured overlays often suffer several drawbacks: 1) The regularity of the architecture makes it difficult to adapt to topology changes; 2) the uniformity of the protocol generally does not consider node heterogeneity. It would be ideal to combine the scalability of ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Application-level multicast on structured overlays often suffer several drawbacks: 1) The regularity of the architecture makes it difficult to adapt to topology changes; 2) the uniformity of the protocol generally does not consider node heterogeneity. It would be ideal to combine the scalability of these overlays with the flexibility of an unstructured topology. In this paper, we propose a locality-aware hybrid overlay that combines the scalability and interface of a structured network with the connection flexibility of an unstructured network. Nodes self-organize into structured clusters based on network locality, while connections between clusters are created adaptively through random walks. Simulations show that this structure is efficient in both delay and bandwidth. The network also supports the scalable fast rendezvous interface provided by structured overlays, resulting in fast membership operations. 1
Scanning and sequential decision making for multi-dimensional data - Part I: the noiseless Case
- IEEE Trans. Inform. Theory
, 2007
"... We consider the problem of sequential decision making on random fields corrupted by noise. In this scenario, the decision maker observes a noisy version of the data, yet judged with respect to the clean data. In particular, we first consider the problem of sequentially scanning and filtering noisy r ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We consider the problem of sequential decision making on random fields corrupted by noise. In this scenario, the decision maker observes a noisy version of the data, yet judged with respect to the clean data. In particular, we first consider the problem of sequentially scanning and filtering noisy random fields. In this case, the sequential filter is given the freedom to choose the path over which it traverses the random field (e.g., noisy image or video sequence), thus it is natural to ask what is the best achievable performance and how sensitive this performance is to the choice of the scan. We formally define the problem of scanning and filtering, derive a bound on the best achievable performance and quantify the excess loss occurring when non-optimal scanners are used, compared to optimal scanning and filtering. We then discuss the problem of sequential scanning and prediction of noisy random fields. This setting is a natural model for applications such as restoration and coding of noisy im-ages. We formally define the problem of scanning and prediction of a noisy multidimensional array and relate the optimal performance to the clean scandictability defined by Merhav and Weissman. Moreover, bounds on the excess loss due to sub-optimal scans are derived, and a universal prediction algorithm is suggested.
On the Quality of Partitions based on Space-Filling Curves
, 2002
"... This paper presents bounds on the quality of partitions induced by space-filling curves. We compare the surface that surrounds an arbitrary index range with the optimal partition in the grid, i. e. the square. It is shown that partitions induced by Lebesgue and Hilbert curves behave about 1.85 times ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper presents bounds on the quality of partitions induced by space-filling curves. We compare the surface that surrounds an arbitrary index range with the optimal partition in the grid, i. e. the square. It is shown that partitions induced by Lebesgue and Hilbert curves behave about 1.85 times worse with respect to the length of the surface. The Lebesgue indexing gives better results than the Hilbert indexing in worst case analysis. Furthermore, the surface of partitions based on the Lebesgue indexing are at most 3 times larger than the optimal in average case.

