Results 1  10
of
17
An Efficient kMeans Clustering Algorithm: Analysis and Implementation
, 2000
"... Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its ..."
Abstract

Cited by 208 (3 self)
 Add to MetaCart
Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for kmeans clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's kmeans clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kdtree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a datasensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real...
RIAC: Robust Intrinsically Motivated Exploration and Active Learning
 IEEE Transactions on Autonomous Mental Development
"... Abstract—Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to selforganize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsical ..."
Abstract

Cited by 24 (9 self)
 Add to MetaCart
Abstract—Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to selforganize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsically motivated learning heuristics could be viewed as active learning algorithms that are particularly suited for learning forward models in unprepared sensorimotor spaces with large unlearnable subspaces. Then, we introduce a novel formulation of IAC, called robust intelligent adaptive curiosity (RIAC), and show that its performances as an intrinsically motivated active learning algorithm are far superior to IAC in a complex sensorimotor space where only a small subspace is neither unlearnable nor trivial. We also show results in which the learnt forward model is reused in a control scheme. Finally, an open source accompanying software containing these algorithms as well as tools to reproduce all the experiments presented in this paper is made publicly available. Index Terms—Active learning, artificial curiosity, developmental robotics, exploration, intrinsic motivation, sensorimotor learning.
The Analysis of a Simple kMeans Clustering Algorithm
, 2000
"... Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nea ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
Kmeans clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for kmeans clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's kmeans clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kdtree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a datasensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from...
Efficient ExpectedCase Algorithms for Planar Point Location
, 2000
"... . Planar point location is among the most fundamental search problems in computational geometry. Although this problem has been heavily studied from the perspective of worstcase query time, there has been surprisingly little theoretical work on expectedcase query time. We are given an nvertex ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
. Planar point location is among the most fundamental search problems in computational geometry. Although this problem has been heavily studied from the perspective of worstcase query time, there has been surprisingly little theoretical work on expectedcase query time. We are given an nvertex planar polygonal subdivision S satisfying some weak assumptions (satisfied, for example, by all convex subdivisions). We are to preprocess this into a data structure so that queries can be answered efficiently. We assume that the two coordinates of each query point are generated independently by a probability distribution also satisfying some weak assumptions (satisfied, for example, by the uniform distribution). In the decision tree model of computation, it is wellknown from information theory that a lower bound on the expected number of comparisons is entropy(S). We provide two data structures, one of size O(n 2 ) that can answer queries in 2 entropy(S) + O(1) expected number...
Applications and Variations of Domination in Graphs
, 2000
"... In a graph G =(V,E), S ⊆ V is a dominating set of G if every vertex is either in S or joined by an edge to some vertex in S. Many different types of domination have been researched extensively. This dissertation explores some new variations and applications of dominating sets. We first introduce the ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
In a graph G =(V,E), S ⊆ V is a dominating set of G if every vertex is either in S or joined by an edge to some vertex in S. Many different types of domination have been researched extensively. This dissertation explores some new variations and applications of dominating sets. We first introduce the concept of Roman domination. A Roman dominating function is a function f: V →{0, 1, 2} such that every vertex v for which f(v) =0hasa neighbor w with f(w) = 2. This corresponds to a problem in army placement where every region is either defended by its own army or has a neighbor with two armies, in which case one of the two armies can be sent to the undefended region if a conflict breaks out. The weight of a Roman dominating function f is f(V) = � v∈V f(v), and we are interested in finding Roman dominating functions of minimum weight. We explore the graph theoretic, algorithmic, and complexity issues of Roman domination, including algorithms for finding minimum weight Roman dominating functions for trees and grids.
It's Okay to Be Skinny, If Your Friends Are Fat
 Center for Geometric Computing 4th Annual Workshop on Computational Geometry
, 1999
"... The kdtree is a popular and simple data structure for range searching and nearest neighbor searching. Such a tree subdivides space into rectangular cells through the recursive application of some splitting rule. The choice of splitting rule affects the shape of cells and the structure of the res ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
The kdtree is a popular and simple data structure for range searching and nearest neighbor searching. Such a tree subdivides space into rectangular cells through the recursive application of some splitting rule. The choice of splitting rule affects the shape of cells and the structure of the resulting tree. It has been shown that an important element in achieving efficient query times for approximate queries is that each cell should be fat, meaning that the ratio of its longest side to shortest side (its aspect ratio) should be bounded. Subdivisions with fat cells satisfy a property called the packing constraint, which bounds the number of disjoint cells of a given size that can overlap a ball of a given radius. We consider a splitting rule called the slidingmidpoint rule. It has been shown to provide efficient search times for approximate nearest neighbor and range searching, both in practice and in terms of expected case query time. However it has not been possible to pro...
FAST RADIAL BASIS FUNCTION INTERPOLATION VIA PRECONDITIONED KRYLOV ITERATION ∗
"... Abstract. We consider a preconditioned Krylov subspace iterative algorithm presented by Faul, Goodsell, and Powell (IMA J. Numer. Anal. 25 (2005), pp. 1–24) for computing the coefficients of a radial basis function interpolant over N data points. This preconditioned Krylov iteration has been demonst ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract. We consider a preconditioned Krylov subspace iterative algorithm presented by Faul, Goodsell, and Powell (IMA J. Numer. Anal. 25 (2005), pp. 1–24) for computing the coefficients of a radial basis function interpolant over N data points. This preconditioned Krylov iteration has been demonstrated to be extremely robust to the distribution of the points and the iteration rapidly convergent. However, the iterative method has several steps whose computational and memory costs scale as O(N 2), both in preliminary computations that compute the preconditioner and in the matrixvector product involved in each step of the iteration. We effectively accelerate the iterative method to achieve an overall cost of O(N log N). The matrix vector product is accelerated via the use of the fast multipole method. The preconditioner requires the computation of a set of closest points to each point. We develop an O(N log N) algorithm for this step as well. Results are presented for multiquadric interpolation in R 2 and biharmonic interpolation in R 3. A novel FMM algorithm for the evaluation of sums involving multiquadric functions in R 2 is presented as well.
ExpectedCase Complexity of Approximate Nearest Neighbor Searching
 IN PROC. 11TH ACMSIAM SYMPOS. DISCRETE ALGORITHMS
, 2000
"... Most research in algorithms for geometric query problems has focused on their worstcase performance. But when information on the query distribution is available, the alternative paradigm of designing and analyzing algorithms from the perspective of expectedcase performance appears more attractive. ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Most research in algorithms for geometric query problems has focused on their worstcase performance. But when information on the query distribution is available, the alternative paradigm of designing and analyzing algorithms from the perspective of expectedcase performance appears more attractive. We study the approximate nearest neighbor problem from this point of view. As a first step in this direction, we assume that the query points are chosen uniformly from a hypercube that encloses all the data points; however, we make no assumption on the distribution of data points. We investigate three simple variants of partition trees: slidingmidpoint, balancesplit, and hybridsplit trees. We show that with these simple treebased data structures, it is possible to achieve linear space and logarithmic or polylogarithmic query time in the expected case. In contrast, the data structures known to achieve linear space and logarithmic query time in the worst case are complex, and algorithms on...
On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions
, 2001
"... In nearest neighbor searching we are given a set of n data points in real ddimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quit ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In nearest neighbor searching we are given a set of n data points in real ddimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. Given the limitation of linear storage, the best known data structures suer from expectedcase query times that grow exponentially in d. However, it is widely regarded in practice that data sets in high dimensional spaces tend to consist of clusters residing in much lower dimensional subspaces. This raises the question of whether data structures for nearest neighbor searching adapt to the presence of lower dimensional clustering, and further how performance varies when the clusters are aligned with the coordinate axes. We analyze the popular kdtree data structure in the form of two variants based on a modication of the splitting method, which produces cells satisfy the basic packing properties needed for eciency without producing empty cells. We show that when data points are uniformly distributed on a k dimensional hyperplane for k d, then expected number of leaves visited in such a kdtree grows exponentially in k, but not in d. We show that the growth rate is even smaller still if the hyperplane is aligned with the coordinate axes. We present empirical studies to support our theoretical results. Keywords: Nearest neighbor searching, kdtrees, splitting methods, expectedcase analysis, clustering. 1
A Learning Framework for Nearest Neighbor Search
"... Can we leverage learning techniques to build a fast nearestneighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explo ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Can we leverage learning techniques to build a fast nearestneighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KDtrees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1