Results 1 - 10
of
157
Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions
- Journal of Machine Learning Research
, 2002
"... This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framew ..."
Abstract
-
Cited by 272 (16 self)
- Add to MetaCart
This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.
ROCK: A Robust Clustering Algorithm for Categorical Attributes
- In Proc.ofthe15thInt.Conf.onDataEngineering
, 2000
"... Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than point ..."
Abstract
-
Cited by 262 (2 self)
- Add to MetaCart
Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters.
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract
-
Cited by 177 (0 self)
- Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
The Ispd98 Circuit Benchmark Suite
- Proc. ACM/IEEE International Symposium on Physical Design, April 98
, 1998
"... From 1985-1993, the MCNC regularly introduced and maintained circuit benchmarks for use by the Design Automation community. However, during the last five years, no new circuits have been introduced that can be used for developing fundamental physical design applications, such as partitioning and pla ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
From 1985-1993, the MCNC regularly introduced and maintained circuit benchmarks for use by the Design Automation community. However, during the last five years, no new circuits have been introduced that can be used for developing fundamental physical design applications, such as partitioning and placement. The largest circuit in the existing set of benchmark suites has over 100,000 modules, but the second largest has just over 25,000 modules, which is small by today's standards. This paper introduces the ISPD98 benchmark suite which consists of 18 circuits with sizes ranging from 13,000 to 210,000 modules. Experimental results for three existing partitioners are presented so that future researchers in partitioning can more easily evaluate their heuristics. 1 Introduction For over a decade, the Design Automation (DA) community has heavily relied on circuit benchmark suites to compare and validate their algorithms. Hundreds and perhaps thousands of publications have presented experiment...
Multilevel k-way Hypergraph Partitioning
, 1999
"... In this paper, we present a new multilevel k-way hypergraph partitioning algorithm that substantially outperforms the existing state-of-the-art K-PM=LR algorithm for multiway partitioning, both for optimizing local as well as global objectives. Experiments on ..."
Abstract
-
Cited by 97 (6 self)
- Add to MetaCart
In this paper, we present a new multilevel k-way hypergraph partitioning algorithm that substantially outperforms the existing state-of-the-art K-PM=LR algorithm for multiway partitioning, both for optimizing local as well as global objectives. Experiments on
Clustering Based On Association Rule Hypergraphs
"... Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new metho ..."
Abstract
-
Cited by 80 (16 self)
- Add to MetaCart
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new methodology for clustering related items using association rules, and clustering related transactions using clusters of items. Our approach is linearly scalable with respect to the number of transactions. The frequent item-sets used to derive association rules are also used to group items into a hypergraph edge, and a hypergraph partitioning algorithm is used to find the clusters. Our experiments indicate that clustering using association rule hypergraphs holds great promise in several application domains. Our experiments with stock-market data and congressional voting data show that this clustering scheme is able to successfully group items that belong to the same group. Clustering of items can ...
Document Categorization and Query Generation on the World Wide Web Using WebACE
- AI Review
, 1999
"... We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and ..."
Abstract
-
Cited by 71 (25 self)
- Add to MetaCart
We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely related to the starting set. The document categories are not given a priori. We present the overall architecture and describe two novel algorithms which provide significant improvement over traditional clustering algorithms and form the basis for the query generation and search component of the agent. We report on the results of our experiments comparing these new algorithms with more traditional clustering algorithms and we show that our algorithms are fast and scalable.
An Interconnect-Centric Design Flow for Nanometer Technologies
- Proceedings of the IEEE
, 1999
"... As the IC devices is scaled into nanometer dimen- sions and operates in giga-hertz frequencies, interconnect design and optimization have become critical in determining the system performance and reliability. ..."
Abstract
-
Cited by 58 (23 self)
- Add to MetaCart
As the IC devices is scaled into nanometer dimen- sions and operates in giga-hertz frequencies, interconnect design and optimization have become critical in determining the system performance and reliability.
WebACE: A Web Agent for Document Categorization and Exploration
, 1998
"... We propose an agent for exploring and categorizing documents on the World Wide Web. The heart of the agent is an automatic categorization of a set of documents, combined with a process for generating new queries used to search for new related documents and filtering the resulting documents to extrac ..."
Abstract
-
Cited by 57 (16 self)
- Add to MetaCart
We propose an agent for exploring and categorizing documents on the World Wide Web. The heart of the agent is an automatic categorization of a set of documents, combined with a process for generating new queries used to search for new related documents and filtering the resulting documents to extract the set of documents most closely related to the starting set. The document categories are not given a-priori. We present the overall architecture and describe two novel algorithms which provide significant improvement over traditional clustering algorithms and form the basis for the query generation and search component of the agent. 1 Introduction The World Wide Web is a vast resource of information and services that continues to grow rapidly. Powerful search engines have been developed to aid in locating unfamiliar documents by category, contents, or subject. Relying on large indexes to documents located on the Web, search engines determine the URLs of those documents satisfying a use...

