Results 1  10
of
102
Prototype selection for dissimilaritybased classifiers
 Pattern Recognition
, 2006
"... A conventional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. A more efficient and sometimes a more accurate solution is offered by other dissimilaritybased classifiers. They construct a decision rule based on the entire training set, but they nee ..."
Abstract

Cited by 59 (8 self)
 Add to MetaCart
A conventional way to discriminate between objects represented by dissimilarities is the nearest neighbor method. A more efficient and sometimes a more accurate solution is offered by other dissimilaritybased classifiers. They construct a decision rule based on the entire training set, but they need just a small set of prototypes, the socalled representation set, as a reference for classifying new objects. Such alternative approaches may be especially advantageous for nonEuclidean or even nonmetric dissimilarities. The choice of a proper representation set for dissimilaritybased classifiers is not yet fully investigated. It appears that a random selection may work well. In this paper, a number of experiments has been conducted on various metric and nonmetric dissimilarity representations and prototype selection methods. Several procedures, like traditional feature selection methods (here effectively searching for prototypes), mode seeking and linear programming are compared to the random selection. In general, we find out that systematic approaches lead to better results than the random selection, especially for a small number of prototypes. Although there is no single winner as it depends on data characteristics, the kcentres works well, in general. For twoclass problems, an important observation is that our dissimilaritybased discrimination functions relying on significantly reduced prototype sets (3–10 % of the training objects) offer a similar or much better classification accuracy than the best kNN rule on the entire training set. This may be reached for multiclass data as well, however such problems are more difficult.
Learning weighted metrics to minimize nearestneighbor classification error

, 2005
"... ..."
(Show Context)
Document clustering via adaptive subspace iteration
 In SIGIR
, 2004
"... Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification vi ..."
Abstract

Cited by 36 (7 self)
 Add to MetaCart
(Show Context)
Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI1, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification via an iterative alternating optimization procedure. Motivated from the optimization procedure, we then provide a novel method to determine the number of clusters. We also discuss the connections of ASI with various existential clustering approaches. Finally, extensive experimental results on real data sets show the effectiveness of ASI algorithm.
S.: Local distance functions: A taxonomy, new algorithms, and an evaluation
 In: Proc. ICCV (2009
"... We present a taxonomy for local distance functions where most existing algorithms can be regarded as approximations of the geodesic distance defined by a metric tensor. We categorize existing algorithms by how, where and when they estimate the metric tensor. We also extend the taxonomy along each ax ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
(Show Context)
We present a taxonomy for local distance functions where most existing algorithms can be regarded as approximations of the geodesic distance defined by a metric tensor. We categorize existing algorithms by how, where and when they estimate the metric tensor. We also extend the taxonomy along each axis. How: We introduce hybrid algorithms that use a combination of dimensionality reduction and metric learning to ameliorate overfitting. Where: We present an exact polynomial time algorithm to integrate the metric tensor along the lines between the test and training points under the assumption that the metric tensor is piecewise constant. When: We propose an interpolation algorithm where the metric tensor is sampled at a number of references points during the offline phase, which are then interpolated during online classification. We also present a comprehensive evaluation of all the algorithms on tasks in face recognition, object recognition, and digit recognition. 1.
Distance learning for similarity estimation
, 2008
"... In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
In this paper, we present a general guideline to find a better distance measure for similarity estimation based on statistical analysis of distribution models and distance functions. A new set of distance measures are derived from the harmonic distance, the geometric distance, and their generalized variants according to the Maximum Likelihood theory. These measures can provide a more accurate feature model than the classical euclidean and Manhattan distances. We also find that the feature elements are often from heterogeneous sources that may have different influence on similarity estimation. Therefore, the assumption of single isotropic distribution model is often inappropriate. To alleviate this problem, we use a boosted distance measure framework that finds multiple distance measures, which fit the distribution of selected feature elements best for accurate similarity estimation. The new distance measures for similarity estimation are tested on two applications: stereo matching and motion tracking in video sequences. The performance of boosted distance measure is further evaluated on several benchmark data sets from the UCI repository and two image retrieval applications. In all the experiments, robust results are obtained based on the proposed methods.
Querysensitive embeddings
 In ACM International Conference on Management of Data (SIGMOD). 706–717. ACM Transactions on Database Systems, Vol. ?, No. ?, ? 20?. · Vassilis Athitsos et al
"... A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. Embeddi ..."
Abstract

Cited by 23 (11 self)
 Add to MetaCart
(Show Context)
A common problem in many types of databases is retrieving the most similar matches to a query object. Finding those matches in a large database can be too slow to be practical, especially in domains where objects are compared using computationally expensive similarity (or distance) measures. Embedding methods can significantly speed up retrieval by mapping objects into a vector space, where distances can be measured rapidly using a Minkowski metric. In this paper we present a novel way to improve embedding quality. In particular, we propose to construct embeddings that use a “querysensitive ” distance measure for the target space of the embedding. This distance measure is used to compare the vectors that the query and database objects are mapped to. The term “querysensitive ” means that the distance measure changes depending on the current query object. We demonstrate theoretically that using a querysensitive distance measure increases the modeling power of embeddings and allows them to capture more of the structure of the original space. We also demonstrate experimentally that querysensitive embeddings can significantly improve retrieval performance. In experiments with an image database of handwritten digits and a timeseries database, the proposed method outperforms existing stateoftheart nonEuclidean indexing methods, meaning that it provides significantly better tradeoffs between efficiency and retrieval accuracy.
Toward robust distance metric analysis for similarity estimation
 In IEEE International Conference on Computer Vision and Pattern Recognition
, 2006
"... In this paper, we present a general guideline to establish the relation between a distribution model and its corresponding similarity estimation. A rich set of distance metrics, such as harmonic distance and geometric distance, is derived according to Maximum Likelihood theory. These metrics can pro ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we present a general guideline to establish the relation between a distribution model and its corresponding similarity estimation. A rich set of distance metrics, such as harmonic distance and geometric distance, is derived according to Maximum Likelihood theory. These metrics can provide a more accurate feature model than the conventional Euclidean distance (SSD) and Manhattan distance (SAD). Because the feature elements are from heterogeneous sources and may have different influence on similarity estimation, the assumption of single isotropic distribution model is often inappropriate. We propose a novel boosted distance metric that not only finds the best distance metric that fits the distribution of the underlying elements but also selects the most important feature elements with respect to similarity. We experiment with different distance metrics for similarity estimation and compute
Adaptive Kernel Metric Nearest Neighbor Classification
 IN PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
, 2002
"... Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions due to the curseofdimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose an adaptive nearest ne ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions due to the curseofdimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose an adaptive nearest neighbor classification method to try to minimize bias. We use quasiconformal transformed kernels to compute neighborhoods over which the class probabilities tend to be more homogeneous. As a result, better classification performance can be expected. The efficacy of our method is validated and compared against other competing techniques using a variety of data sets.
An Adaptive Metric Machine for Pattern Classification
 Advances in Neural Information Processing Systems 13
, 2000
"... Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
(Show Context)
Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions with finite samples due to the curse of dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose a locally adaptive nearest neighbor classification method to try to minimize bias. We use a Chisquared distance analysis to compute a flexible metric for producing neighborhoods that are elongated along less relevant feature dimensions and constricted along most influential ones. As a result, the class conditional probabilities tend to be smoother in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other techniques using a variety of real world data.