Results 1  10
of
263,292
On Spectral Clustering: Analysis and an algorithm
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
, 2001
"... Despite many empirical successes of spectral clustering methods  algorithms that cluster points using eigenvectors of matrices derived from the distances between the points  there are several unresolved issues. First, there is a wide variety of algorithms that use the eigenvectors in slightly ..."
Abstract

Cited by 1697 (13 self)
 Add to MetaCart
Despite many empirical successes of spectral clustering methods  algorithms that cluster points using eigenvectors of matrices derived from the distances between the points  there are several unresolved issues. First, there is a wide variety of algorithms that use the eigenvectors
FastMap: A Fast Algorithm for Indexing, DataMining and Visualization of Traditional and Multimedia Datasets
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types ..."
Abstract

Cited by 497 (23 self)
 Add to MetaCart
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several
Scatter/Gather: A Clusterbased Approach to Browsing Large Document Collections
, 1992
"... Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably ..."
Abstract

Cited by 772 (12 self)
 Add to MetaCart
Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably
Mean shift, mode seeking, and clustering
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1995
"... AbstractMean shift, a simple iterative procedure that shifts each data point to the average of data points in its neighborhood, is generalized and analyzed in this paper. This generalization makes some kmeans like clustering algorithms its special cases. It is shown that mean shift is a modeseeki ..."
Abstract

Cited by 620 (0 self)
 Add to MetaCart
AbstractMean shift, a simple iterative procedure that shifts each data point to the average of data points in its neighborhood, is generalized and analyzed in this paper. This generalization makes some kmeans like clustering algorithms its special cases. It is shown that mean shift is a mode
GPFS: A SharedDisk File System for Large Computing Clusters
 In Proceedings of the 2002 Conference on File and Storage Technologies (FAST
, 2002
"... GPFS is IBM's parallel, shareddisk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community ove ..."
Abstract

Cited by 518 (3 self)
 Add to MetaCart
existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Efficient similarity search in sequence databases
, 1994
"... We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Anot ..."
Abstract

Cited by 505 (21 self)
 Add to MetaCart
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong
OPTICS: Ordering Points To Identify the Clustering Structure
, 1999
"... Cluster analysis is a primary method for database mining. It is either used as a standalone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of ..."
Abstract

Cited by 511 (49 self)
 Add to MetaCart
Cluster analysis is a primary method for database mining. It is either used as a standalone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all
Probabilistic Latent Semantic Indexing
, 1999
"... Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized ..."
Abstract

Cited by 1207 (11 self)
 Add to MetaCart
Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized
Mtree: An Efficient Access Method for Similarity Search in Metric Spaces
, 1997
"... A new access meth d, called Mtree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion o ..."
Abstract

Cited by 652 (38 self)
 Add to MetaCart
A new access meth d, called Mtree, is proposed to organize and search large data sets from a generic "metric space", i.e. whE4 object proximity is only defined by a distance function satisfyingth positivity, symmetry, and triangle inequality postulates. We detail algorith[ for insertion
Fast Algorithms for Mining Association Rules
, 1994
"... We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known a ..."
Abstract

Cited by 3551 (15 self)
 Add to MetaCart
algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scaleup experiments show that AprioriHybrid scales linearly
Results 1  10
of
263,292