Results 1  10
of
19
The Priority RTree: A Practically Efficient and WorstCase Optimal RTree
 SIGMOD 2004 JUNE 1318, 2004, PARIS, FRANCE
, 2004
"... We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymp ..."
Abstract

Cited by 56 (7 self)
 Add to MetaCart
We present the Priority Rtree, or PRtree, which is the first Rtree variant that always answers a window query using O((N/B) 1−1/d + T/B) I/Os, where N is the number of ddimensional (hyper) rectangles stored in the Rtree, B is the disk block size, and T is the output size. This is provably asymptotically optimal and significantly better than other Rtree variants, where a query may visit all N/B leaves in the tree even when T = 0. We also present an extensive experimental study of the practical performance of the PRtree using both reallife and synthetic data. This study shows that the PRtree performs similar to the best known Rtree variants on reallife and relatively nicely distributed data, but outperforms them significantly on more extreme data.
STXXL: Standard template library for XXL data sets
 In: Proc. of ESA 2005. Volume 3669 of LNCS
, 2005
"... for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
for processing huge data sets that can fit only on hard disks. It supports parallel disks, overlapping between disk I/O and computation and it is the first I/Oefficient algorithm library that supports the pipelining technique that can save more than half of the I/Os. STXXL has been applied both in academic and industrial environments for a range of problems including text processing, graph algorithms, computational geometry, gaussian elimination, visualization, and analysis of microscopic images, differential cryptographic analysis, etc. The performance of STXXL and its applications is evaluated on synthetic and realworld inputs. We present the design of the library, how its performance features are supported, and demonstrate how the library integrates with STL. KEY WORDS: very large data sets; software library; C++ standard template library; algorithm engineering 1.
Implementing I/OEfficient Data Structures Using TPIE
 In Proc. European Symposium on Algorithms
, 2002
"... In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexib ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
In recent years, many theoretically I/Oefficient algorithms and data structures have been developed. The TPIE project at Duke University was started to investigate the practical importance of these theoretical results. The goal of this ongoing project is to provide a portable, extensible, flexible, and easy to use C++ programming environment for efficiently implementing I/Oalgorithms and data structures. The TPIE library has been developed in two phases. The first phase focused on supporting algorithms with a sequential I/O pattern, while the recently developed second phase has focused on supporting online I/Oefficient data structures, which exhibit a more random I/O pattern. This paper describes the design and implementation of the second phase of TPIE.
FastSLAM: An efficient solution to the simultaneous localization and mapping problem with unknown data association
 Journal of Machine Learning Research
"... This article provides a comprehensive description of FastSLAM, a new family of algorithms for the simultaneous localization and mapping problem, which specifically address hard data association problems. The algorithm uses a particle filter for sampling robot paths, and extended Kalman filters for r ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
This article provides a comprehensive description of FastSLAM, a new family of algorithms for the simultaneous localization and mapping problem, which specifically address hard data association problems. The algorithm uses a particle filter for sampling robot paths, and extended Kalman filters for representing maps acquired by the vehicle. This article presents two variants of this algorithm, the original algorithm along with a more recent variant that provides improved performance in certain operating regimes. In addition to a mathematical derivation of the new algorithm, we present a proof of convergence and experimental results on its performance on realworld data. 1
Cacheoblivious data structures for orthogonal range searching
 IN PROC. ACM SYMPOSIUM ON COMPUTATIONAL GEOMETRY
, 2003
"... We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
We develop cacheoblivious data structures for orthogonal range searching, the problem of finding all T points in a set of N points in Rd lying in a query hyperrectangle. Cacheoblivious data structures are designed to be efficient in arbitrary memory hierarchies. We describe a dynamic linearsize data structure that answers ddimensional queries in O((N/B)11/d + T/B) memory transfers, where B is the block size of any two levels of a multilevel memory hierarchy. A point can be inserted into or deleted from this data structure in O(log2B N) memory transfers. We also develop a static structure for the twodimensional case that answers queries in O(logB N + T /B) memory transfers using O(N log22 N) space. The analysis of the latter structure requires that B = 22 c for some nonnegative integer constant c.
CacheOblivious Planar Orthogonal Range Searching and Counting
 In Proc. ACM Symposium on Computational Geometry
, 2005
"... We present the first cacheoblivious data structure for planar orthogonal range counting, and improve on previous results for cacheoblivious planar orthogonal range searching. Our range counting structure uses O(N log2 N) space and answers queries using O(logB N) memory transfers, where B is the bl ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
We present the first cacheoblivious data structure for planar orthogonal range counting, and improve on previous results for cacheoblivious planar orthogonal range searching. Our range counting structure uses O(N log2 N) space and answers queries using O(logB N) memory transfers, where B is the block size of any memory level in a multilevel memory hierarchy. Using bit manipulation techniques, the space can be further reduced to O(N). The structure can also be modified to support more general semigroup range sum queries in O(logB N) memory transfers, using O(N log2 N) space for threesided queries and O(N log 2 2 N / log2 log2 N)
Unsupervised clustering on dynamic databases
 Pattern Recognition Letters
, 2005
"... Clustering algorithms typically assume that the available data constitute a random sample from a stationary distribution. As data accumulate over time the underlying process that generates them can change. Thus, the development of algorithms that can extract clustering rules in nonstationary enviro ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Clustering algorithms typically assume that the available data constitute a random sample from a stationary distribution. As data accumulate over time the underlying process that generates them can change. Thus, the development of algorithms that can extract clustering rules in nonstationary environments is necessary. In this paper, we present an extension of the kwindows algorithm that can track the evolution of cluster models in dynamically changing databases, without a significant computational overhead. Experiments show that the kwindows algorithm can effectively and efficiently identify the changes on the pattern structure. Ó 2005 Elsevier B.V. All rights reserved.
M.N.: Novel approaches to unsupervised clustering through the kwindows algorithm
 Knowledge Mining. Studies in Fuzziness and Soft Computing
, 2005
"... Summary. The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. Acritical issue for any clustering algorithm is the determination of the nu ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
Summary. The extraction of meaningful information from large collections of data is a fundamental issues in science. To this end, clustering algorithms are typically employed to identify groups (clusters) of similar objects. Acritical issue for any clustering algorithm is the determination of the number of clusters present in a dataset. In this contribution we present a clustering algorithm that in addition to partitioning the data into clusters, it approximates the number of clusters during its execution.We further present modifications of this algorithm for different distributed environments, and dynamic databases. Finally, we present a modification of the algorithm that exploits the fractal dimension of the data to partition the dataset. 1
ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING
"... Abstract: We study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter f ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract: We study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter for the exploration/exploitation dilemma (iii) less randomized than the fullrandom sampling (yet also not deterministic). Experiments are performed in the case of regression for valuefunction learning on a continuous domain. Our main results are (i) efficient partially derandomized point sets (ii) moderatederandomization theorems (iii) experimental evidence of the importance of the frontier (iv) a new regressionspecific userfriendly sampling tool lessrobust than blind samplers but that sometimes works very efficiently in large dimensions. All experiments can be reproduced by downloading the source code and running the provided command line. 1
Simple and semidynamic structures for cacheoblivious planar orthogonal range searching
 In Proc. 22nd ACM Symposium on Computational Geometry
, 2006
"... In this paper, we develop improved cacheoblivious data structures for two and threesided planar orthogonal range searching. Our main result is an optimal static structure for twosided range searching that uses linear space and supports queries in O(logB N + T/B) memory transfers, where B is the ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In this paper, we develop improved cacheoblivious data structures for two and threesided planar orthogonal range searching. Our main result is an optimal static structure for twosided range searching that uses linear space and supports queries in O(logB N + T/B) memory transfers, where B is the block size of any level in a multilevel memory hierarchy and T is the number of reported points. Our structure is the first linearspace cacheoblivious structure for a planar range searching problem with the optimal O(logB N +T/B) query bound. The structure is very simple, and we believe it to be of practical interest. We also show that our twosided range search structure can be constructed cacheobliviously in O(N logB N) memory transfers. Using the logarithmic method and fractional cascading, this leads to a semidynamic linearspace structure that supports twosided range queries in O(log2 N + T/B) memory transfers and insertions in O(log2 N ·logB N) memory transfers amortized. This structure is the first (semi)dynamic structure for any planar range searching problem with a query bound that is logarithmic in the number of elements in the structure and linear in the output size. Finally, using a simple standard construction, we also obtain a static O(N log2 N)space structure for threesided range searching that supports queries in the optimal bound of O(logB N +T/B) memory transfers. These bounds match the bounds of the best previously known structure for this