Results 11  20
of
107
On the Performance of AntBased Clustering
 Proc. of the 3 rd Int. Conf. on Hybrid Intelligent Systems, IOS
, 2003
"... Antbased clustering and sorting is a natureinspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to textmining, all with some promise. However, although early results were broadly encouraging, there has been very l ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Antbased clustering and sorting is a natureinspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to textmining, all with some promise. However, although early results were broadly encouraging, there has been very limited analytical evaluation of the algorithm. Toward this end, we first propose a scheme that enables unbiased interpretation of the clustering solutions obtained, and then use this to conduct a full evaluation of the algorithm. Our analysis uses three sets each of real and artificial data, and four distinct analytical measures. These results are compared with those obtained using established clustering techniques and we find evidence that antbased clustering is a robust and viable alternative.
Scalable Parallel Clustering for Data Mining on Multicomputers
 Lecture Notes in Computer Science
, 2000
"... This paper describes the design and implementation on MIMD parallel machines of PAutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The PAutoClass implementation divides the clustering task among the processor ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
This paper describes the design and implementation on MIMD parallel machines of PAutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The PAutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results. The system architecture, its implementation and experimental performance results on different processor numbers and dataset sizes are presented and discussed. In particular, efficiency and scalability of PAutoClass versus the sequential AutoClass system are evaluated and compared. 1
Automatic Clustering Using an Improved Differential Evolution Algorithm
, 2008
"... Differential evolution (DE) has emerged as one of the fast, robust, and efficient global search heuristics of current interest. This paper describes an application of DE to the automatic clustering of large unlabeled data sets. In contrast to most of the existing clustering techniques, the proposed ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
Differential evolution (DE) has emerged as one of the fast, robust, and efficient global search heuristics of current interest. This paper describes an application of DE to the automatic clustering of large unlabeled data sets. In contrast to most of the existing clustering techniques, the proposed algorithm requires no prior knowledge of the data to be classified. Rather, it determines the optimal number of partitions of the data “on the run. ” Superiority of the new method is demonstrated by comparing it with two recently developed partitional clustering techniques and one popular hierarchical clustering algorithm. The partitional clustering algorithms are based on two powerful wellknown optimization algorithms, namely the genetic algorithm and the particle swarm optimization. An interesting realworld application of the proposed method to automatic segmentation of images is also reported.
A scalable parallel subspace clustering algorithm for massive data sets
 In: Proc. International Conference on Parallel Processing
, 2000
"... Clustering is a data mining problem which finds dense regions in a sparse multidimensional data set. The attribute values and ranges of these regions characterize the clusters. Clustering algorithms need to scale with the data base size and also with the large dimensionality of the data set. Furthe ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
Clustering is a data mining problem which finds dense regions in a sparse multidimensional data set. The attribute values and ranges of these regions characterize the clusters. Clustering algorithms need to scale with the data base size and also with the large dimensionality of the data set. Further, these algorithms need to explore the embedded clusters in a subspace of a high dimensional space. However, the time complexity of the algorithm to explore clusters in subspaces is exponential in the dimensionality of the data and is thus extremely compute intensive. Thus, parallelization is the choice for discovering clusters for large data sets. In this paper we present a scalable parallel subspace clustering algorithm which has both data and task parallelism embedded in it. We also formulate the technique of adaptive grids and present a truly unsupervised clustering algorithm requiring no user inputs. Our implementation shows near linear speedups with negligible communication overheads. The use of adaptive grids results in two orders of magnitude improvement in the computation time of our serial algorithm over current methods with much better quality of clustering. Performance results on both real and synthetic data sets with very large number of dimensions on a 16 node IBM SP2 demonstrate our algorithm to be a practical and scalable clustering technique. 1.
Fast Agglomerative Clustering for Rendering
"... Hierarchical representations of large data sets, such as binary cluster trees, are a crucial component in many scalable algorithms used in various fields. Two major approaches for building these trees are agglomerative, or bottomup, clustering and divisive, or topdown, clustering. The agglomerativ ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
(Show Context)
Hierarchical representations of large data sets, such as binary cluster trees, are a crucial component in many scalable algorithms used in various fields. Two major approaches for building these trees are agglomerative, or bottomup, clustering and divisive, or topdown, clustering. The agglomerative approach offers some real advantages such as more flexible clustering and often produces higher quality trees, but has been little used in graphics because it is frequently assumed to be prohibitively expensive (O(N2) or worse). In this paper we show that agglomerative clustering can be done efficiently even for very large data sets. We introduce a novel locallyordered algorithm that is faster than traditional heapbased agglomerative clustering and show that the complexity of the tree build time is much closer to linear than quadratic. We also evaluate the quality of the agglomerative clustering trees compared to the best known divisive clustering strategies in two sample applications: bounding volume hierarchies for ray tracing and light trees in the Lightcuts rendering algorithm. Tree quality is highly application, data set, and dissimilarity function specific. In our experiments the agglomerativebuilt tree quality is consistently higher by margins ranging from slight to significant, with up to 35 % reduction in tree query times.
Time and Space Efficient Pose Clustering
 In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 1994
"... This paper shows that the pose clustering method of object recognition can be decomposed into small subproblems without loss of accuracy. Randomization can then be used to limit the number of subproblems that need to be examined to achieve accurate recognition. These techniques are used to decrease ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
This paper shows that the pose clustering method of object recognition can be decomposed into small subproblems without loss of accuracy. Randomization can then be used to limit the number of subproblems that need to be examined to achieve accurate recognition. These techniques are used to decrease the computational complexity of pose clustering. The clustering step is formulated as an efficient tree search of the pose space. This method requires little memory since not many poses are clustered at a time. Analysis shows that pose clustering is not inherently more sensitive to noise than other methods of generating hypotheses. Finally, experiments on real and synthetic data are presented. 1 Introduction Modelbased object recognition systems determine which objects appear in images using a catalog of object models and estimate their positions and orientations (poses) relative to the camera. This paper examines methods of improving the efficiency of the pose clustering method of object ...
2003a. A method for decentralized clustering in large multiagent systems
 In Proceedings of the second International Joint Conference on Autonomous Agents and MultiAgent systems (AAMAS
, 2003
"... This paper examines a method of clustering within a fully decentralized multiagent system. Our goal is to group agents with similar objectives or data, as is done in traditional clustering. However, we add the additional constraint that agents must remain in place on a network, instead of first ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
This paper examines a method of clustering within a fully decentralized multiagent system. Our goal is to group agents with similar objectives or data, as is done in traditional clustering. However, we add the additional constraint that agents must remain in place on a network, instead of first being collected into a centralized database. To do this we connect agents in a random network and have them search in a peertopeer fashion for other similar agents. We thus aim to tackle the basic clustering problem on an Internet scale and create a method by which agents themselves can be grouped, forming coalitions. In order to investigate the feasibility of a decentralized approach, this paper presents a number of simulation experiments involving agents representing twodimensional points. A comparison between our method’s clustering ability and that of the kmeans clustering algorithm is presented. Generated data sets containing 2,500 to 160,000 points (agents) grouped in 25 to 1,600 clusters are examined. Results show that our decentralized agent method produces a better clustering than the centralized kmeans algorithm, quickly placing 95 % to 99 % of points correctly. The the time required to find a clustering depends on the quality of solution required; a fairly good solution is quickly converged on, and then slowly improved. Overall, our experiments indicate that the time to find a particular quality of solution increases less than linearly with the number of agents.
AntBased Clustering: A Comparative Study of its relative performance with respect to kmeans, average link and 1DSOM
, 2003
"... Antbased clustering and sorting is a natureinspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to textmining, all with some promise. However, although early results were broadly encouraging, there has been very l ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
Antbased clustering and sorting is a natureinspired heuristic for general clustering tasks. It has been applied variously, from problems arising in commerce, to circuit design, to textmining, all with some promise. However, although early results were broadly encouraging, there has been very limited analytical evaluation of antbased clustering. Toward this end, we first propose a scheme that enables unbiased interpretation of the clustering solutions obtained, and then use this to conduct a full evaluation of the algorithm. Our analysis uses three sets each of real and artificial data, and four distinct analytical measures. These results are compared with those obtained using established clustering techniques and we find evidence that antbased clustering is a robust and viable alternative.
InfoGrid: Providing Information Integration for Knowledge Discovery
 Information Sciences
, 2003
"... Many scientific experiments produce large amounts of data using highthroughput devices. In order to analyse this type of data Knowledge Discovery systems are required. However, generic laboratory systems do not provide any contextual information about the system that is being studied. In these situ ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Many scientific experiments produce large amounts of data using highthroughput devices. In order to analyse this type of data Knowledge Discovery systems are required. However, generic laboratory systems do not provide any contextual information about the system that is being studied. In these situations, Knowledge Discovery can be aided and validated by the use of Information integration tools. In this paper, we introduce InjbGrid, a data integration, middleware engine, designed to operate under a Grid framework. It focuses on providing information access services and offers all users a query system which is able to retain the familiarity with their spedtic sdentific applications while being diverse, flexible and open at the same time. The assumption there is that defining a common language for all queries is not desirable.
A Survey On: Content Based Image Retrieval Systems Using Clustering Techniques For Large Data sets
"... Contentbased image retrieval (CBIR) is a new but widely adopted method for finding images from vast and unannotated image databases. As the network and development of multimedia technologies are becoming more popular, users are not satisfied with the traditional information retrieval techniques. So ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Contentbased image retrieval (CBIR) is a new but widely adopted method for finding images from vast and unannotated image databases. As the network and development of multimedia technologies are becoming more popular, users are not satisfied with the traditional information retrieval techniques. So nowadays the content based image retrieval (CBIR) are becoming a source of exact and fast retrieval. In recent years, a variety of techniques have been developed to improve the performance of CBIR. Data clustering is an unsupervised method for extraction hidden pattern from huge data sets. With large data sets, there is possibility of high dimensionality. Having both accuracy and efficiency for high dimensional data sets with enormous number of samples is a challenging arena. In this paper the clustering techniques are discussed and analysed. Also, we propose a method HDK that uses more than one clustering technique to improve the performance of CBIR.This method makes use of hierachical and divide and conquer KMeans clustering technique with equivalency and compatible relation concepts to improve the performance of the KMeans for using in high dimensional datasets. It also introduced the feature like color, texture and shape for accurate and effective retrieval system.