Results 1  10
of
42
Optimal MultiStep kNearest Neighbor Search
, 1998
"... For an increasing number of modern database applications, efficient support of similarity search becomes an important task. Along with the complexity of the objects such as images, molecules and mechanical parts, also the complexity of the similarity models increases more and more. Whereas algorithm ..."
Abstract

Cited by 169 (21 self)
 Add to MetaCart
For an increasing number of modern database applications, efficient support of similarity search becomes an important task. Along with the complexity of the objects such as images, molecules and mechanical parts, also the complexity of the similarity models increases more and more. Whereas algorithms that are directly based on indexes work well for simple mediumdimensional similarity distance functions, they do not meet the efficiency requirements of complex highdimensional and adaptable distance functions. The use of a multistep query processing strategy is recommended in these cases, and our investigations substantiate that the number of candidates which are produced in the filter step and exactly evaluated in the refinement step is a fundamental efficiency parameter. After revealing the strong performance shortcomings of the stateoftheart algorithm for knearest neighbor search [Korn et al. 1996], we present a novel multistep algorithm which is guaranteed to produce the minim...
Similarity Indexing: Algorithms and Performance
 In Proceedings SPIE Storage and Retrieval for Image and Video Databases
, 1996
"... Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this pr ..."
Abstract

Cited by 113 (1 self)
 Add to MetaCart
Efficient indexing support is essential to allow contentbased image and video databases using similaritybased retrieval to scale to large databases (tens of thousands up to millions of images). In this paper, we take an in depth look at this problem. One of the major difficulties in solving this problem is the high dimension (6100) of the feature vectors that are used to represent objects. We provide an overview of the work in computational geometry on this problem and highlight the results we found are most useful in practice, including the use of approximate nearest neighbor algorithms. We also present a variant of the optimized kd tree we call the VAM kd tree, and provide algorithms to create an optimized Rtree we call the VAMSplit Rtree. We found that the VAMSplit Rtree provided better overall performance than all competing structures we tested for main memory and secondary memory applications. We observed large improvements in performance relative to the R*tree and SStree in secondary memory applications, and modest improvements relative to optimized kd tree variants.Nearest Neighbor Search
Algorithms for Fast Vector Quantization
 Proc. of DCC '93: Data Compression Conference
, 1993
"... Nearest neighbor searching is an important geometric subproblem in vector quantization. ..."
Abstract

Cited by 65 (12 self)
 Add to MetaCart
Nearest neighbor searching is an important geometric subproblem in vector quantization.
An efficient kmeans clustering algorithm
 In Proceedings of IPPS/SPDP Workshop on High Performance Data Mining
, 1998
"... In this paper, we present a novel algorithm for performing kmeans clustering. It organizes all the patterns in a kd tree structure such that one can find all the patterns which are closest to a given prototype efficiently. The main intuition behind our approach is as follows. All the prototypes ar ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
In this paper, we present a novel algorithm for performing kmeans clustering. It organizes all the patterns in a kd tree structure such that one can find all the patterns which are closest to a given prototype efficiently. The main intuition behind our approach is as follows. All the prototypes are potential candidates for the closest prototype at the root level. However, for the children of the root node, we may be able to prune the candidate set by using simple geometrical constraints. This approach can be applied recursively until the size of the candidate set is one for each node. Our experimental results demonstrate that our scheme can improve the computational speed of the direct kmeans algorithm by an order to two orders of magnitude in the total number of distance calculations and the overall time of computation. 1.
Adaptive MultiStage Distance Join Processing
 In SIGMOD
, 1999
"... A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in online query processing or internet search environments. The ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in online query processing or internet search environments. These requirements pose new challenges as well as opportunities for more ecient processing of spatial distance join queries. In this paper, we rst present an ecient kdistance join algorithm that uses spatial indexes such as Rtrees. Bidirectional node expansion and planesweeping techniques are used for fast pruning of distant pairs, and the planesweeping is further optimized by novel strategies for selecting a sweeping axis and direction. Furthermore, we propose adaptive multistage algorithms for kdistance join and incremental distance join operations. Our performance study shows that the proposed adaptive multistage algorithms outperform previous work by up to an order of magnitu...
The Bucket Box Intersection (BBI) Algorithm For Fast Approximative Evaluation Of Diagonal Mixture Gaussians
 In Proc. ICASSP
, 1996
"... Today, most of the stateoftheart speech recognizers are based on Hidden Markov modeling. Using semicontinuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensiv ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
Today, most of the stateoftheart speech recognizers are based on Hidden Markov modeling. Using semicontinuous or continuous density Hidden Markov Models, the computation of emission probabilities requires the evaluation of mixture Gaussian probability density functions. Since it is very expensive to evaluate all the Gaussians of the mixture density codebook, many recognizers only compute the M most significant Gaussians (M = 1; : : : ; 8). This paper presents an alternative approach to approximate mixture Gaussians with diagonal covariance matrices, based on a binary feature space partitioning tree. The proposed algorithm is experimentally evaluated in the context of large vocabulary, speaker independent, spontaneous speech recognition using the JANUS2 speech recognizer. In the case of mixtures with 50 Gaussians, we achieve a speedup of 25 in the computation of HMM emission probabilities, without affecting the accuracy of the system. 1. INTRODUCTION To approximate the log probab...
Unsupervised Distributed Clustering
, 2004
"... Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. In this paper we propose a modifica ..."
Abstract

Cited by 20 (12 self)
 Add to MetaCart
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. In this paper we propose a modification of a recently proposed algorithm, namely kwindows, that is able to achieve high quality results in distributed computing environments.
Clustering in evolutionary algorithms to efficiently compute simultaneously local and global minima
 In IEEE Congress on Evolutionary Computation
, 2005
"... Abstract In this paper a new clustering operator for Evolutionary Algorithms is proposed. The operator incorporates the unsupervised k–windows clustering algorithm, utilizing already computed pieces of information regarding the search space in an attempt to discover regions containing groups of ind ..."
Abstract

Cited by 15 (12 self)
 Add to MetaCart
Abstract In this paper a new clustering operator for Evolutionary Algorithms is proposed. The operator incorporates the unsupervised k–windows clustering algorithm, utilizing already computed pieces of information regarding the search space in an attempt to discover regions containing groups of individuals located close to different minimizers. Consequently, the search is confined inside these regions and a large number of global and local minima of the objective function can be efficiently computed. Extensive experiments shown that the proposed approach is effective and reliable, and greatly accelerates the convergence speed of the considered algorithms. 1
Unsupervised clustering on dynamic databases
 Pattern Recognition Letters
, 2005
"... Clustering algorithms typically assume that the available data constitute a random sample from a stationary distribution. As data accumulate over time the underlying process that generates them can change. Thus, the development of algorithms that can extract clustering rules in nonstationary enviro ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Clustering algorithms typically assume that the available data constitute a random sample from a stationary distribution. As data accumulate over time the underlying process that generates them can change. Thus, the development of algorithms that can extract clustering rules in nonstationary environments is necessary. In this paper, we present an extension of the kwindows algorithm that can track the evolution of cluster models in dynamically changing databases, without a significant computational overhead. Experiments show that the kwindows algorithm can effectively and efficiently identify the changes on the pattern structure. Ó 2005 Elsevier B.V. All rights reserved.