Results 1 
8 of
8
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 715 (33 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Accounting for Boundary Effects in Nearest Neighbor Searching
, 1995
"... Given n data points in ddimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large rel ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
Given n data points in ddimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2^d. Largely for this reason, the actual performances of many nearest neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest neighbor searching by the buck...
Automatic Class Selection and Prototyping for 3D Object Databases
 Proc. Int’l Conf. 3D Digital Imaging and Modeling
, 2003
"... Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This p ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This paper presents two ideas to address these problems (i) class selection, i.e., grouping similar objects into classes (ii) class prototyping, i.e., exploiting common structure within classes to represent the classes. At run time matching a query against the prototypes is sufficient for classification. This approach will not only reduce the retrieval time but also will help increase the generalizing power of the classification algorithm. Objects are segmented into classes automatically using an agglomerative clustering algorithm. Prototypes from these classes are extracted using one of three class prototyping algorithms. Experimental results demonstrate the effectiveness of the two steps in speeding up the classification process without sacrificing accuracy. 1.
Efficient {0,1}String Searching Based on Preclustering
, 1996
"... In this paper we consider the f0,1gstring searching problem. For a given set S of binary strings of fixed length d and a query string q one asks for the most similar string in S. Thereby the dissimilarity of two given strings is the number of disagreeing bits, that is, their Hamming distance. We p ..."
Abstract
 Add to MetaCart
In this paper we consider the f0,1gstring searching problem. For a given set S of binary strings of fixed length d and a query string q one asks for the most similar string in S. Thereby the dissimilarity of two given strings is the number of disagreeing bits, that is, their Hamming distance. We present an efficient f0,1gstring searching algorithm based on hierarchical preclustering. To this end we give several useful observations on the interand intracluster distances. The presented algorithms are easy to implement and we give exhaustive experimental results for uniformly distributed sets as well as for specially chosen strings. These experiments indicate that our algorithms work well in practice. 1 Introduction 1.1 Notation and Problem Definition In this paper we consider the f0,1gstring search problem. For a given set S of binary strings of fixed length d and a query string q we ask for the most similar string in S. Thereby the dissimilarity of two given strings is the numbe...
Recommended Citation
"... Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This p ..."
Abstract
 Add to MetaCart
Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This paper presents two ideas to address these problems (i) class selection, i.e., grouping similar objects into classes (ii) class prototyping, i.e., exploiting common structure within classes to represent the classes. At run time matching a query against the prototypes is sufficient for classification. This approach will not only reduce the retrieval time but also will help increase the generalizing power of the classification algorithm. Objects are segmented into classes automatically using an agglomerative clustering algorithm. Prototypes from these classes are extracted using one of three class prototyping algorithms. Experimental results demonstrate the effectiveness of the two steps in speeding up the classification process without sacrificing accuracy. 1.
PARTIALMATCH RETRIEVAL ALGORITHMS*
"... Abstract. We examine the efficiency of hashcoding and treesearch algorithms for retrieving from a file of kletter words all words which match a partiallyspecified input query word (for example, retrieving all sixletter English words of the form S**R*H where "* " is a "don’t care " character). W ..."
Abstract
 Add to MetaCart
Abstract. We examine the efficiency of hashcoding and treesearch algorithms for retrieving from a file of kletter words all words which match a partiallyspecified input query word (for example, retrieving all sixletter English words of the form S**R*H where "* " is a "don’t care " character). We precisely characterize those balanced hashcoding algorithms with minimum average number of lists examined. Use of the first few letters of each word as a list index is shown to be one such optimal algorithm. A new class of combinatorial designs (called associative block designs) provides better hash functions with a greatly reduced worstcase number of lists examined, yet with optimal average behavior maintained. Another efficient variant involves storing each word in several lists. Treesearch algorithms are shown to be approximately as efficient as hashcoding algorithms, on the average. In general, these algorithms require time about O(n <ks)/k) to respond to a query word with s letters specified, given a file of n kletter words. Previous algorithms either required time O(s n/k) or else used exorbitant amounts of storage.
in Classification and Data Mining
"... This book maintains articles on actual problems of classification, data mining and forecasting as well as natural language processing: new approaches, models, algorithms and methods for classification, forecasting and clusterisation. Classification of non complete and noise data; discrete optimiza ..."
Abstract
 Add to MetaCart
This book maintains articles on actual problems of classification, data mining and forecasting as well as natural language processing: new approaches, models, algorithms and methods for classification, forecasting and clusterisation. Classification of non complete and noise data; discrete optimization in logic recognition algorithms construction, complexity, asymptotically optimal algorithms, mixedinteger problem of minimization of empirical risk, multiobjective linear integer programming problems; questions of complexity of some discrete optimization tasks and corresponding tasks of data analysis and pattern recognition; the algebraic approach for pattern recognition problems of correct classification algorithms construction, logical correctors and resolvability of challenges of classification, construction of optimum algebraic correctors over sets of algorithms of computation of estimations, conditions of correct algorithms existence; regressions, restoring of dependences according to training sampling, parametrical approach for piecewise linear dependences restoration, and nonparametric regressions based on collective solution on set of tasks of recognition; multiagent systems in knowledge discovery, collective evolutionary systems, advantages and disadvantages of synthetic data mining methods, intelligent search agent model realizing information extraction on ontological model of data mining methods; methods of search of logic regularities sets of classes and extraction of optimal subsets, construction of convex combination of associated predictors that minimizes mean error; algorithmic constructions in a model of recognizing the nearest neighbors in binary data sets, discrete isoperimetry problem solutions, logiccombinatorial scheme in highthroughput gene expression data; researches in area of neural network classifiers, and applications in finance field; text mining, automatic classification of scientific papers, information extraction from natural language texts, semantic text analysis,