Results 21 - 30
of
1,558
Fisher Discriminant Analysis With Kernels
, 1999
"... A non-linear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision functi ..."
Abstract
-
Cited by 231 (14 self)
- Add to MetaCart
A non-linear classification technique based on Fisher's discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach.
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are high-dim ..."
Abstract
-
Cited by 225 (4 self)
- Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are high-dimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vp-tree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kd-tree performance is compared.
PCA versus LDA
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2001
"... In the context of the appearance-based paradigm for object recognition, it is generally believed that algorithms based on LDA (Linear Discriminant Analysis) are superior to those based on PCA (Principal Components Analysis) . In this communication we show that this is not always the case. We present ..."
Abstract
-
Cited by 219 (14 self)
- Add to MetaCart
In the context of the appearance-based paradigm for object recognition, it is generally believed that algorithms based on LDA (Linear Discriminant Analysis) are superior to those based on PCA (Principal Components Analysis) . In this communication we show that this is not always the case. We present our case first by using intuitively plausible arguments and then by showing actual results on a face database. Our overall conclusion is that when the training dataset is small, PCA can outperform LDA, and also that PCA is less sensitive to different training datasets. Keywords: face recognition, pattern recognition, principal components analysis, linear discriminant analysis, learning from undersampled distributions, small training datasets. 1
Scaling Clustering Algorithms to Large Databases”, Microsoft Research Report
, 1998
"... Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this wor ..."
Abstract
-
Cited by 197 (5 self)
- Add to MetaCart
Practical clustering algorithms require multiple data scans to achieve convergence. For large databases, these scans become prohibitively expensive. We present a scalable clustering framework applicable to a wide class of iterative clustering. We require at most one scan of the database. In this work, the framework is instantiated and numerically justified with the popular K-Means clustering algorithm. The method is based on identifying regions of the data that are compressible, regions that must be maintained in memory, and regions that are discardable. The algorithm operates within the confines of a limited memory buffer. Empirical results demonstrate that the scalable scheme outperforms a sampling-based approach. In our scheme, data resolution is preserved to the extent possible based upon the size of the allocated memory buffer and the fit of current clustering model to the data. The framework is naturally extended to update multiple clustering models simultaneously. We empirically evaluate on synthetic and publicly available data sets.
Refining Initial Points for K-Means Clustering
, 1998
"... Practical approaches to clustering use an iterative procedure (e.g. K-Means, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition fro ..."
Abstract
-
Cited by 184 (5 self)
- Add to MetaCart
Practical approaches to clustering use an iterative procedure (e.g. K-Means, EM) which converges to one of numerous local minima. It is known that these iterative techniques are especially sensitive to initial starting conditions. We present a procedure for computing a refined starting condition from a given initial one that is based on an efficient technique for estimating the modes of a distribution. The refined initial starting condition allows the iterative algorithm to converge to a "better" local minimum. The procedure is applicable to a wide class of clustering algorithms for both discrete and continuous data. We demonstrate the application of this method to the popular K-Means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. Refinement run time is considerably lower than the time required to cluster the full database. The method is scalable and can be coupled with a scalable clustering algorithm to address the large-scale cl...
The TV-tree -- an index structure for high-dimensional data
- VLDB Journal
, 1994
"... We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract
-
Cited by 177 (7 self)
- Add to MetaCart
We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R -tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for high-dimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, non-traditional (`exotic') data types. The targ...
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract
-
Cited by 177 (0 self)
- Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Recognition without Correspondence using Multidimensional Receptive Field Histograms
- International Journal of Computer Vision
, 2000
"... . The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represente ..."
Abstract
-
Cited by 176 (15 self)
- Add to MetaCart
. The appearance of an object is composed of local structure. This local structure can be described and characterized by a vector of local features measured by local operators such as Gaussian derivatives or Gabor filters. This article presents a technique where appearances of objects are represented by the joint statistics of such local neighborhood operators. As such, this represents a new class of appearance based techniques for computer vision. Based on joint statistics, the paper develops techniques for the identification of multiple objects at arbitrary positions and orientations in a cluttered scene. Experiments show that these techniques can identify over 100 objects in the presence of major occlusions. Most remarkably, the techniques have low complexity and therefore run in real-time. 1. Introduction The paper proposes a framework for the statistical representation of the appearance of arbitrary 3D objects. This representation consists of a probability density function or jo...
Discriminant Analysis for Recognition of Human Face Images
- Journal of Optical Society of America A
, 1997
"... this paper we focus on featureextraction and face-identification processes ..."
Abstract
-
Cited by 155 (6 self)
- Add to MetaCart
this paper we focus on featureextraction and face-identification processes
Robust Analysis of Feature Spaces: Color Image Segmentation
, 1997
"... A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any natu ..."
Abstract
-
Cited by 152 (5 self)
- Add to MetaCart
A general technique for the recovery of significant image features is presented. The technique is basedon the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Featurespace of any naturecan beprocessed, and as an example, color image segmentation is discussed. The segmentation is completely autonomous, only its class is chosen by the user. Thus, the same program can produce a high quality edge image, or provide, by extracting all the significant colors, a preprocessor for content-based query systems. A 512 x 512 color image is analyzed in less than 10 seconds on a standard workstation. Gray level images are handled as color images having only the lightness coordinate.

