### Citations

2834 |
Pattern Classification
- Duda, Hart, et al.
- 2000
(Show Context)
Citation Context ...ine clustering, hybrid approaches have been considered such as in[5]. Graph-theoretic techniques have also been considered for clustering; many earlier hierarchical agglomerative clustering algorithms=-=[9]-=- and some recent work[3, 23] model the similarity between documents by a graph whose vertices correspond to documents and weighted edges or hyperedges give the similarity between vertices. However the... |

774 | Scatter/Gather: A clusterbased approach to browsing large document collections
- Cutting, Karger, et al.
- 1992
(Show Context)
Citation Context ...sed methods including LSA[21], self-organizing maps[18] and multidimensional scaling[16]. For computational efficiency required in on-line clustering, hybrid approaches have been considered such as in=-=[5]-=-. Graph-theoretic techniques have also been considered for clustering; many earlier hierarchical agglomerative clustering algorithms[9] and some recent work[3, 23] model the similarity between documen... |

397 | Concept decompositions for large sparse text data using clustering
- Dhillon, Modha
- 2001
(Show Context)
Citation Context ...CM X-XXXXX-XX-X/XX/XX ...$5.00. Inderjit S. Dhillon Department of Computer Sciences University of Texas, Austin, TX 78712 inderjit@cs.utexas.edu ative clustering[25], the partitional k-means algorithm=-=[7]-=-, projection based methods including LSA[21], self-organizing maps[18] and multidimensional scaling[16]. For computational efficiency required in on-line clustering, hybrid approaches have been consid... |

350 |
A linear time heuristic for improving network partitions
- Fiduccia, Mattheyses
- 1982
(Show Context)
Citation Context ...parallel computation, etc. However it is well known that this problem is NP-complete[12]. But many effective heuristic methods exist, such as, the Kernighan-Lin(KL)[17] and the Fiduccia-Mattheyses(FM)=-=[10]-=- algorithms. However, both the KL and FM algorithms search in the local vicinity of given initial partitionings and have a tendency to get stuck in local minima. 3.1 Spectral Graph Bipartitioning Spec... |

295 | Distributional clustering of words for text classification
- Baker, McCallum
- 1998
(Show Context)
Citation Context ...ying assumption is that words that typically appear together should be associated with similar concepts. Word clustering has also been profitably used in the automatic classification of documents, see=-=[1]-=-. More on word clustering may be found in [24]. In this paper, we consider the problem of simultaneous or co-clustering of documents and words. Most of the existing work is on one-way clustering, i.e.... |

180 |
Lower Bounds for the Partitioning of Graphs
- Donath, Hoffman
- 1973
(Show Context)
Citation Context ...tial partitionings and have a tendency to get stuck in local minima. 3.1 Spectral Graph Bipartitioning Spectral graph partitioning is another effective heuristic that was introduced in the early 1970s=-=[15, 8, 11]-=-, and popularized in 1990[19]. Spectral partitioning generally gives better global solutions than the KL or FM methods. We now introduce the spectral partitioning heuristic. Suppose the graph G = (V, ... |

112 | E±cient Clustering of Very Large Document Collections
- Dhillon, Fan, et al.
- 2001
(Show Context)
Citation Context ...he “true” class label for each document, the confusion matrix captures the goodness of document clustering. In addition, the measures of purity and entropy are easily derived from the confusion matrix=-=[6]-=-. Table 2 summarizes the results of applying Algorithm Bipartition to the MedCran data set. The confusion matrix at the top of the table shows that the document cluster D0 consists entirely of the Med... |

97 | Document categorization and query generation on the world wide web using WebACE. AI Review (accepted for publication),
- Boley, Gini, et al.
- 1999
(Show Context)
Citation Context ...pproaches have been considered such as in[5]. Graph-theoretic techniques have also been considered for clustering; many earlier hierarchical agglomerative clustering algorithms[9] and some recent work=-=[3, 23]-=- model the similarity between documents by a graph whose vertices correspond to documents and weighted edges or hyperedges give the similarity between vertices. However these methods are computational... |

29 |
A Cluster-based Approach to Thesaurus Construction,
- Crouch
- 1988
(Show Context)
Citation Context ...s. Words may be clustered on the basis of the documents in which they co-occur; such clustering has been used in the automatic construction of a statistical thesaurus and in the enhancement of queries=-=[4]-=-. The underlying assumption is that words that typically appear together should be associated with similar concepts. Word clustering has also been profitably used in the automatic classification of do... |

14 | Hierarchical taxonomies using divisive partitioning
- Boley
- 1998
(Show Context)
Citation Context ...l on small data sets, we also created subsets of Classic3 with 30 and 150 documents respectively. Our final data set is a collection of 2340 Reuters news articles downloaded from Yahoo in October 1997=-=[2]-=-. The articles are from 6 categories: 142 from Business, 1384 from Entertainment, 494 from Health, 114 from Politics, 141 from Sports and 60 news articles from Technology. In the preprocessing, HTML t... |