Results 1 - 10
of
4,440
SWAP-2006 Shrinking Number of Clusters by Multi-Dimensional Scaling
"... Abstract — Clustering is to divide given data and then, automatically find out the meanings hidden in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. Clustering, which is used in various ..."
Abstract
- Add to MetaCart
of deciding the number of clusters, which is projecting the center of a cluster on the two-dimensional plane by use of Multi-Dimensional Scaling, and then, combining the clusters. As a result of experimenting this method with real data, it was found that clustering performance became better.
CURE: An Efficient Clustering Algorithm for Large Data sets
- Published in the Proceedings of the ACM SIGMOD Conference
, 1998
"... Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering ..."
Abstract
-
Cited by 722 (5 self)
- Add to MetaCart
clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE achieves this by representing each cluster by a certain fixed number of points that are generated by selecting well scattered points from the cluster
Laplacian eigenmaps and spectral techniques for embedding and clustering.
- Proceeding of Neural Information Processing Systems,
, 2001
"... Abstract Drawing on the correspondence between the graph Laplacian, the Laplace-Beltrami op erator on a manifold , and the connections to the heat equation , we propose a geometrically motivated algorithm for constructing a representation for data sampled from a low dimensional manifold embedded in ..."
Abstract
-
Cited by 668 (7 self)
- Add to MetaCart
in a higher dimensional space. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality preserving properties and a natural connection to clustering. Several applications are considered. In many areas of artificial intelligence, information
Features of similarity.
- Psychological Review
, 1977
"... Similarity plays a fundamental role in theories of knowledge and behavior. It serves as an organizing principle by which individuals classify objects, form concepts, and make generalizations. Indeed, the concept of similarity is ubiquitous in psychological theory. It underlies the accounts of stimu ..."
Abstract
-
Cited by 1455 (2 self)
- Add to MetaCart
. These models represent objects as points in some coordinate space such that the observed dissimilarities between objects correspond to the metric distances between the respective points. Practically all analyses of proximity data have been metric in nature, although some (e.g., hierarchical clustering) yield
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
- In Proceedings of the 17th International Conf. on Machine Learning
, 2000
"... Despite its popularity for general clustering, K-means suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy for the t ..."
Abstract
-
Cited by 418 (5 self)
- Add to MetaCart
Despite its popularity for general clustering, K-means suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy
Self-tuning spectral clustering
- Advances in Neural Information Processing Systems 17
, 2004
"... We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local ’ scale should be used to ..."
Abstract
-
Cited by 362 (2 self)
- Add to MetaCart
We study a number of open issues in spectral clustering: (i) Selecting the appropriate scale of analysis, (ii) Handling multi-scale data, (iii) Clustering with irregular background clutter, and, (iv) Finding automatically the number of groups. We first propose that a ‘local ’ scale should be used
Bioprospector: Discovering Conserved Dna Motifs In Upstream Regulatory Regions Of Co-Expressed Genes
- Pac. Symp. Biocomput
, 2001
"... ms. For a copy of the program and documentation for UNIX systems, please contact xliu@smi.stanford.edu. 1 Introduction Over the last ten years, genomic sequencing has started in over 600 organisms, and over 50 complete genomes are sequenced. The DNA microarray technology permits the measurement o ..."
Abstract
-
Cited by 354 (23 self)
- Add to MetaCart
of gene expression in cultured cells 1 . An increasing number of laboratories are using the combination of these two methods to study gene expression on a genomic scale. After all the genes from an organism are clustered based on their expression patterns 2 , an important next step is to examine
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
"... DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by s ..."
Abstract
-
Cited by 273 (27 self)
- Add to MetaCart
show that excellent absolute performance can be attained—a general-purpose sort of 1012 Bytes of data executes in 319 seconds on a 240-computer, 960disk cluster—as well as demonstrating near-linear scaling of execution time on representative applications as we vary the number of computers used for a
Large-Scale Multi-Dimensional Document Clustering on GPU Clusters
"... Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, includin ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
assess the benefits of exploiting the computational power of Beowulf-like clusters equipped with contemporary Graphics Processing Units (GPUs) as a means to significantly reduce the runtime of flocking-based document clustering. Our framework scales up to over one million documents processed
Support Vector Clustering
, 2001
"... We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several compo ..."
Abstract
-
Cited by 215 (1 self)
- Add to MetaCart
We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several
Results 1 - 10
of
4,440