Results 1 - 10
of
11
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
, 2000
"... K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its ..."
Abstract
-
Cited by 129 (3 self)
- Add to MetaCart
K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real...
The Analysis of a Simple k-Means Clustering Algorithm
, 2000
"... K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nea ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
K-means clustering is a very popular clustering technique, which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is very easy to implement. It differs from most other approaches in that it precomputes a kd-tree data structure for the data points rather than the center points. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time. Second, we have implemented the algorithm and performed a number of empirical studies, both on synthetically generated data and on real data from...
Efficient Expected-Case Algorithms for Planar Point Location
, 2000
"... . Planar point location is among the most fundamental search problems in computational geometry. Although this problem has been heavily studied from the perspective of worst-case query time, there has been surprisingly little theoretical work on expected-case query time. We are given an n-vertex ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
. Planar point location is among the most fundamental search problems in computational geometry. Although this problem has been heavily studied from the perspective of worst-case query time, there has been surprisingly little theoretical work on expected-case query time. We are given an n-vertex planar polygonal subdivision S satisfying some weak assumptions (satisfied, for example, by all convex subdivisions). We are to preprocess this into a data structure so that queries can be answered efficiently. We assume that the two coordinates of each query point are generated independently by a probability distribution also satisfying some weak assumptions (satisfied, for example, by the uniform distribution). In the decision tree model of computation, it is well-known from information theory that a lower bound on the expected number of comparisons is entropy(S). We provide two data structures, one of size O(n 2 ) that can answer queries in 2 entropy(S) + O(1) expected number...
R-IAC: Robust Intrinsically Motivated Exploration and Active Learning
- IEEE Transactions on Autonomous Mental Development
"... Abstract—Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to self-organize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsical ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Abstract—Intelligent adaptive curiosity (IAC) was initially introduced as a developmental mechanism allowing a robot to self-organize developmental trajectories of increasing complexity without preprogramming the particular developmental stages. In this paper, we argue that IAC and other intrinsically motivated learning heuristics could be viewed as active learning algorithms that are particularly suited for learning forward models in unprepared sensorimotor spaces with large unlearnable subspaces. Then, we introduce a novel formulation of IAC, called robust intelligent adaptive curiosity (R-IAC), and show that its performances as an intrinsically motivated active learning algorithm are far superior to IAC in a complex sensorimotor space where only a small subspace is neither unlearnable nor trivial. We also show results in which the learnt forward model is reused in a control scheme. Finally, an open source accompanying software containing these algorithms as well as tools to reproduce all the experiments presented in this paper is made publicly available. Index Terms—Active learning, artificial curiosity, developmental robotics, exploration, intrinsic motivation, sensorimotor learning.
Applications and Variations of Domination in Graphs
, 2000
"... In a graph G =(V,E), S ⊆ V is a dominating set of G if every vertex is either in S or joined by an edge to some vertex in S. Many different types of domination have been researched extensively. This dissertation explores some new variations and applications of dominating sets. We first introduce the ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In a graph G =(V,E), S ⊆ V is a dominating set of G if every vertex is either in S or joined by an edge to some vertex in S. Many different types of domination have been researched extensively. This dissertation explores some new variations and applications of dominating sets. We first introduce the concept of Roman domination. A Roman dominating function is a function f: V →{0, 1, 2} such that every vertex v for which f(v) =0hasa neighbor w with f(w) = 2. This corresponds to a problem in army placement where every region is either defended by its own army or has a neighbor with two armies, in which case one of the two armies can be sent to the undefended region if a conflict breaks out. The weight of a Roman dominating function f is f(V) = � v∈V f(v), and we are interested in finding Roman dominating functions of minimum weight. We explore the graph theoretic, algorithmic, and complexity issues of Roman domination, including algorithms for finding minimum weight Roman dominating functions for trees and grids.
It's Okay to Be Skinny, If Your Friends Are Fat
- Center for Geometric Computing 4th Annual Workshop on Computational Geometry
, 1999
"... The kd-tree is a popular and simple data structure for range searching and nearest neighbor searching. Such a tree subdivides space into rectangular cells through the recursive application of some splitting rule. The choice of splitting rule affects the shape of cells and the structure of the res ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The kd-tree is a popular and simple data structure for range searching and nearest neighbor searching. Such a tree subdivides space into rectangular cells through the recursive application of some splitting rule. The choice of splitting rule affects the shape of cells and the structure of the resulting tree. It has been shown that an important element in achieving efficient query times for approximate queries is that each cell should be fat, meaning that the ratio of its longest side to shortest side (its aspect ratio) should be bounded. Subdivisions with fat cells satisfy a property called the packing constraint, which bounds the number of disjoint cells of a given size that can overlap a ball of a given radius. We consider a splitting rule called the sliding-midpoint rule. It has been shown to provide efficient search times for approximate nearest neighbor and range searching, both in practice and in terms of expected case query time. However it has not been possible to pro...
On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions
, 2001
"... In nearest neighbor searching we are given a set of n data points in real d-dimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quit ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In nearest neighbor searching we are given a set of n data points in real d-dimensional space, R d , and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported eciently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. Given the limitation of linear storage, the best known data structures suer from expected-case query times that grow exponentially in d. However, it is widely regarded in practice that data sets in high dimensional spaces tend to consist of clusters residing in much lower dimensional subspaces. This raises the question of whether data structures for nearest neighbor searching adapt to the presence of lower dimensional clustering, and further how performance varies when the clusters are aligned with the coordinate axes. We analyze the popular kd-tree data structure in the form of two variants based on a modication of the splitting method, which produces cells satisfy the basic packing properties needed for eciency without producing empty cells. We show that when data points are uniformly distributed on a k- dimensional hyperplane for k d, then expected number of leaves visited in such a kd-tree grows exponentially in k, but not in d. We show that the growth rate is even smaller still if the hyperplane is aligned with the coordinate axes. We present empirical studies to support our theoretical results. Keywords: Nearest neighbor searching, kd-trees, splitting methods, expected-case analysis, clustering. 1
FAST RADIAL BASIS FUNCTION INTERPOLATION VIA PRECONDITIONED KRYLOV ITERATION ∗
"... Abstract. We consider a preconditioned Krylov subspace iterative algorithm presented by Faul, Goodsell, and Powell (IMA J. Numer. Anal. 25 (2005), pp. 1–24) for computing the coefficients of a radial basis function interpolant over N data points. This preconditioned Krylov iteration has been demonst ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. We consider a preconditioned Krylov subspace iterative algorithm presented by Faul, Goodsell, and Powell (IMA J. Numer. Anal. 25 (2005), pp. 1–24) for computing the coefficients of a radial basis function interpolant over N data points. This preconditioned Krylov iteration has been demonstrated to be extremely robust to the distribution of the points and the iteration rapidly convergent. However, the iterative method has several steps whose computational and memory costs scale as O(N 2), both in preliminary computations that compute the preconditioner and in the matrix-vector product involved in each step of the iteration. We effectively accelerate the iterative method to achieve an overall cost of O(N log N). The matrix vector product is accelerated via the use of the fast multipole method. The preconditioner requires the computation of a set of closest points to each point. We develop an O(N log N) algorithm for this step as well. Results are presented for multiquadric interpolation in R 2 and biharmonic interpolation in R 3. A novel FMM algorithm for the evaluation of sums involving multiquadric functions in R 2 is presented as well.
Fast algorithms for nearest neighbour search
, 2007
"... The nearest neighbour problem is of practical significance in a number of fields. Often we are interested in finding an object near to a given query object. The problem is old, and a large number of solutions have been proposed for it in the literature. However, it remains the case that even the mos ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The nearest neighbour problem is of practical significance in a number of fields. Often we are interested in finding an object near to a given query object. The problem is old, and a large number of solutions have been proposed for it in the literature. However, it remains the case that even the most popular of the techniques proposed for its solution have not been compared against each other. Also, many techniques, including the old and popular ones, can be implemented in a number of ways, and often the different implementations of a technique have not been thoroughly compared either. This research presents a detailed investigation of different implementations of two popular nearest neighbour search data structures, KDTrees and Metric Trees, and compares the different implementations of each of the two structures against each other. The best implementations of these structures are then compared against each other and against two other techniques, Annulus Method and Cover Trees. Annulus Method is an old technique that was rediscovered during the research for this thesis. Cover Trees are one of the most novel and promising data structures for nearest neighbour search that have been proposed in the literature. i Acknowledgments The continued support of Department of Computer Science’s Machine Learning group, and particularly my supervisor Dr. Eibe Frank, is greatly appreciated, without which this thesis would not have been possible.
A Learning Framework for Nearest Neighbor Search
"... Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explo ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Can we leverage learning techniques to build a fast nearest-neighbor (ANN) retrieval data structure? We present a general learning framework for the NN problem in which sample queries are used to learn the parameters of a data structure that minimize the retrieval time and/or the miss rate. We explore the potential of this novel framework through two popular NN data structures: KD-trees and the rectilinear structures employed by locality sensitive hashing. We derive a generalization theory for these data structure classes and present simple learning algorithms for both. Experimental results reveal that learning often improves on the already strong performance of these data structures. 1

