Results 1 
5 of
5
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 293 (4 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Rough Sets: A Tutorial
, 1998
"... A rapid growth of interest in rough set theory [290] and its applications can be lately seen in the number of international workshops, conferences and seminars that are either directly dedicated to rough sets, include the subject in their programs, or simply accept papers that use this approach t ..."
Abstract

Cited by 67 (6 self)
 Add to MetaCart
(Show Context)
A rapid growth of interest in rough set theory [290] and its applications can be lately seen in the number of international workshops, conferences and seminars that are either directly dedicated to rough sets, include the subject in their programs, or simply accept papers that use this approach to solve problems at hand. A large number of high quality papers on various aspects of rough sets and their applications have been published in recent years as a result of this attention. The theory has been followed by the development of several software systems that implement rough set operations. In Section 12 we present a list of software systems based on rough sets. Some of the toolkits, provide advanced graphical environments that support the process of developing and validating rough set classifiers. Rough sets are applied in many domains, such as, for instance, medicine, finance, telecommunication, vibration analysis, conflict resolution, intelligent agents, image analysis, p...
An Empirical Investigation of Brute Force to choose Features, Smoothers and Function Approximators
 Computational Learning Theory and Natural Learning Systems
, 1992
"... The generalization error of a function approximator, feature set or smoother can be estimated directly by the leaveoneout crossvalidation error. For memorybased methods, this is computationally feasible. We describe an initial version of a general memorybased learning system (GMBL): a large col ..."
Abstract

Cited by 42 (10 self)
 Add to MetaCart
The generalization error of a function approximator, feature set or smoother can be estimated directly by the leaveoneout crossvalidation error. For memorybased methods, this is computationally feasible. We describe an initial version of a general memorybased learning system (GMBL): a large collection of learners brought into a widely applicable machinelearning family. We present ongoing investigations into search algorithms which, given a dataset, find the family members and features that generalize best. We also describe GMBL's application to two noisy, difficult problemspredicting car engine emissions from pressure waves, and controlling a robot billiards player with redundant state variables. 1 Introduction The main engineering benefit of machine learning is its application to autonomous systems in which human decision making is minimized. Function approximation plays a large and successful role in this process. However, many other human decisions are needed even for si...
Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search (Extended Abstract)
 IN PROC. 11TH ACMSIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'00
, 1999
"... We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&r ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
We consider the problem of nearest neighbor search in the Euclidean hypercube [ 1, +1]^d with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction R of the maximum interpoint distance in this space, i.e. within distance 2R&radic;d of the query. We introduce the idea of aggressive pruning and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of ddimensional inner product operations, is i) strongly sublinear with respect to the data set size n for moderate R, ii) asymptotically, and as a practical matter, independent of dimension. Given a random data set, a random query within distance 2R&radic;d of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs...
A Source Coding Approach to Classification by Vector Quantization and the Principle of Minimum Description Length
"... An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data f(Xi;Yi)g n i=1, which are independent samples from a joint distribution PXY. Based on the principle of Minimum Description Length (MDL) ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data f(Xi;Yi)g n i=1, which are independent samples from a joint distribution PXY. Based on the principle of Minimum Description Length (MDL), a statistical model that approximates the distribution PXY ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X; Y) efficiently to provide ample information on the distribution PXY. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a twostage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely Discriminant Vector Quantization (DVQ), is compared with Learning Vector Quantization (LVQ) and CARTR on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.