Results 1  10
of
13
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 759 (35 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Accounting for Boundary Effects in Nearest Neighbor Searching
, 1995
"... Given n data points in ddimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large rel ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
Given n data points in ddimensional space, nearest neighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearest neighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearest neighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2^d. Largely for this reason, the actual performances of many nearest neighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearest neighbor searching by the buck...
Automatic Class Selection and Prototyping for 3D Object Databases
 Proc. Int’l Conf. 3D Digital Imaging and Modeling
, 2003
"... Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This p ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This paper presents two ideas to address these problems (i) class selection, i.e., grouping similar objects into classes (ii) class prototyping, i.e., exploiting common structure within classes to represent the classes. At run time matching a query against the prototypes is sufficient for classification. This approach will not only reduce the retrieval time but also will help increase the generalizing power of the classification algorithm. Objects are segmented into classes automatically using an agglomerative clustering algorithm. Prototypes from these classes are extracted using one of three class prototyping algorithms. Experimental results demonstrate the effectiveness of the two steps in speeding up the classification process without sacrificing accuracy. 1.
Recommended Citation
"... Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This p ..."
Abstract
 Add to MetaCart
(Show Context)
Most research on 3D object classification and recognition focuses on recognition of objects in 3D scenes from a small database of known 3D models. Such an approach does not scale well to large databases of objects and does not generalize well to unknown (but similar) object classification. This paper presents two ideas to address these problems (i) class selection, i.e., grouping similar objects into classes (ii) class prototyping, i.e., exploiting common structure within classes to represent the classes. At run time matching a query against the prototypes is sufficient for classification. This approach will not only reduce the retrieval time but also will help increase the generalizing power of the classification algorithm. Objects are segmented into classes automatically using an agglomerative clustering algorithm. Prototypes from these classes are extracted using one of three class prototyping algorithms. Experimental results demonstrate the effectiveness of the two steps in speeding up the classification process without sacrificing accuracy. 1.
PARTIALMATCH RETRIEVAL ALGORITHMS*
"... Abstract. We examine the efficiency of hashcoding and treesearch algorithms for retrieving from a file of kletter words all words which match a partiallyspecified input query word (for example, retrieving all sixletter English words of the form S**R*H where "* " is a "don’t care ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We examine the efficiency of hashcoding and treesearch algorithms for retrieving from a file of kletter words all words which match a partiallyspecified input query word (for example, retrieving all sixletter English words of the form S**R*H where "* " is a "don’t care " character). We precisely characterize those balanced hashcoding algorithms with minimum average number of lists examined. Use of the first few letters of each word as a list index is shown to be one such optimal algorithm. A new class of combinatorial designs (called associative block designs) provides better hash functions with a greatly reduced worstcase number of lists examined, yet with optimal average behavior maintained. Another efficient variant involves storing each word in several lists. Treesearch algorithms are shown to be approximately as efficient as hashcoding algorithms, on the average. In general, these algorithms require time about O(n <ks)/k) to respond to a query word with s letters specified, given a file of n kletter words. Previous algorithms either required time O(s n/k) or else used exorbitant amounts of storage.
von
, 2001
"... I would like to express my thanks to all people who supported me during the past years while I have been working on this thesis. My warmest thanks go to Professor Dr. HansPeter Kriegel. He took particular care to maintain a good working atmosphere within the group and to provide a supportive and in ..."
Abstract
 Add to MetaCart
I would like to express my thanks to all people who supported me during the past years while I have been working on this thesis. My warmest thanks go to Professor Dr. HansPeter Kriegel. He took particular care to maintain a good working atmosphere within the group and to provide a supportive and inspiring environment. I am grateful to Professor Dr. Stefan Conrad and to Professor Dr. Gerhard Weikum from the University of Saarland who both were readily willing to act as referees to this work. This work could not have grown and matured without the discussions with my colleagues. In particular I would like to mention here Dr. Bernhard Braunmüller and Florian Krebs. Most of my publications have been done in a close collaboration with them. But also all my other colleagues from the database systems group of the University of Munich had important contributions to my work by fruitful discussions and joint publications. I would like to thank
Geometry © 1996 SpringerVerlag New York Inc. Accounting for Boundary Effects in NearestNeighbor Searching ∗
"... Abstract. Given n data points in ddimensional space, nearestneighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearestneighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Given n data points in ddimensional space, nearestneighbor searching involves determining the nearest of these data points to a given query point. Most averagecase analyses of nearestneighbor searching algorithms are made under the simplifying assumption that d is fixed and that n is so large relative to d that boundary effects can be ignored. This means that for any query point the statistical distribution of the data points surrounding it is independent of the location of the query point. However, in many applications of nearestneighbor searching (such as data compression by vector quantization) this assumption is not met, since the number of data points n grows roughly as 2 d. Largely for this reason, the actual performances of many nearestneighbor algorithms tend to be much better than their theoretical analyses would suggest. We present evidence of why this is the case. We provide an accurate analysis of the number of cells visited in nearestneighbor searching by the bucketing and kd tree algorithms. We assume m d points uniformly distributed in dimension d, where m is a fixed integer ≥2. Further, we assume that distances are measured in the L ∞ metric. Our analysis is tight in the limit as d approaches infinity.
AN ANALYSIS OF OPTIMAL RETRIEVAL SYSTEMS WITH UPDATES
, 1974
"... The performance of computerimplemented systems for data storage, retrieval, and update is investigated. A data structure is modeled by a set D = {d 1, d. d D of data bases. A set of questions A = {Xlk 2."' about any d E D may be answered. A memory that is bitaddressable by an algorit ..."
Abstract
 Add to MetaCart
(Show Context)
The performance of computerimplemented systems for data storage, retrieval, and update is investigated. A data structure is modeled by a set D = {d 1, d. d D of data bases. A set of questions A = {Xlk 2.&quot;' about any d E D may be answered. A memory that is bitaddressable by an algorithm or an automaton models a computer. A retrieval system is composed of a particular mapping of data bases onto memory representations and a particular algorithm or automaton. By accessing bits of memory the algorithm can answer any X E A about the d represented in memory and can update memory to represent a new d * E D. Lower bounds are derived for the performance measures of storage efficiency, retrieval efficiency, and update efficiency. The minima are simultaneously
in Classification and Data Mining
"... This book maintains articles on actual problems of classification, data mining and forecasting as well as natural language processing: new approaches, models, algorithms and methods for classification, forecasting and clusterisation. Classification of non complete and noise data; discrete optimiza ..."
Abstract
 Add to MetaCart
This book maintains articles on actual problems of classification, data mining and forecasting as well as natural language processing: new approaches, models, algorithms and methods for classification, forecasting and clusterisation. Classification of non complete and noise data; discrete optimization in logic recognition algorithms construction, complexity, asymptotically optimal algorithms, mixedinteger problem of minimization of empirical risk, multiobjective linear integer programming problems; questions of complexity of some discrete optimization tasks and corresponding tasks of data analysis and pattern recognition; the algebraic approach for pattern recognition problems of correct classification algorithms construction, logical correctors and resolvability of challenges of classification, construction of optimum algebraic correctors over sets of algorithms of computation of estimations, conditions of correct algorithms existence; regressions, restoring of dependences according to training sampling, parametrical approach for piecewise linear dependences restoration, and nonparametric regressions based on collective solution on set of tasks of recognition; multiagent systems in knowledge discovery, collective evolutionary systems, advantages and disadvantages of synthetic data mining methods, intelligent search agent model realizing information extraction on ontological model of data mining methods; methods of search of logic regularities sets of classes and extraction of optimal subsets, construction of convex combination of associated predictors that minimizes mean error; algorithmic constructions in a model of recognizing the nearest neighbors in binary data sets, discrete isoperimetry problem solutions, logiccombinatorial scheme in highthroughput gene expression data; researches in area of neural network classifiers, and applications in finance field; text mining, automatic classification of scientific papers, information extraction from natural language texts, semantic text analysis,