Results 1 
9 of
9
Approximation Algorithms for Projective Clustering
 Proceedings of the ACM SIGMOD International Conference on Management of data, Philadelphia
, 2000
"... We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w ..."
Abstract

Cited by 246 (21 self)
 Add to MetaCart
We consider the following two instances of the projective clustering problem: Given a set S of n points in R d and an integer k ? 0; cover S by k hyperstrips (resp. hypercylinders) so that the maximum width of a hyperstrip (resp., the maximum diameter of a hypercylinder) is minimized. Let w be the smallest value so that S can be covered by k hyperstrips (resp. hypercylinders), each of width (resp. diameter) at most w : In the plane, the two problems are equivalent. It is NPHard to compute k planar strips of width even at most Cw ; for any constant C ? 0 [50]. This paper contains four main results related to projective clustering: (i) For d = 2, we present a randomized algorithm that computes O(k log k) strips of width at most 6w that cover S. Its expected running time is O(nk 2 log 4 n) if k 2 log k n; it also works for larger values of k, but then the expected running time is O(n 2=3 k 8=3 log 4 n). We also propose another algorithm that computes a c...
Clustering with instancelevel constraints
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... One goal of research in artificial intelligence is to automate tasks that currently require human expertise; this automation is important because it saves time and brings problems that were previously too large to be solved into the feasible domain. Data analysis, or the ability to identify meaningf ..."
Abstract

Cited by 150 (6 self)
 Add to MetaCart
One goal of research in artificial intelligence is to automate tasks that currently require human expertise; this automation is important because it saves time and brings problems that were previously too large to be solved into the feasible domain. Data analysis, or the ability to identify meaningful patterns and trends in large volumes of data, is an important task that falls into this category. Clustering algorithms are a particularly useful group of data analysis tools. These methods are used, for example, to analyze satellite images of the Earth to identify and categorize different land and foliage types or to analyze telescopic observations to determine what distinct types of astronomical bodies exist and to categorize each observation. However, most existing clustering methods apply general similarity techniques rather than making use of problemspecific information. This dissertation first presents a novel method for converting existing clustering algorithms into constrained clustering algorithms. The resulting methods are able to accept domainspecific information in the form of constraints on the output clusters. At the most general level, each constraint is an instancelevel statement
Iterative Optimization and Simplification of Hierarchical Clusterings
 Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract

Cited by 103 (1 self)
 Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
Semisupervised Clustering with User Feedback
, 2003
"... We present a new approach to clustering based on the observation that \it is easier to criticize than to construct." Our approach of semisupervised clustering allows a user to iteratively provide feedback to a clustering algorithm. The feedback is incorporated in the form of constraints which ..."
Abstract

Cited by 100 (2 self)
 Add to MetaCart
We present a new approach to clustering based on the observation that \it is easier to criticize than to construct." Our approach of semisupervised clustering allows a user to iteratively provide feedback to a clustering algorithm. The feedback is incorporated in the form of constraints which the clustering algorithm attempts to satisfy on future iterations. These constraints allow the user to guide the clusterer towards clusterings of the data that the user nds more useful. We demonstrate semisupervised clustering with a system that learns to cluster news stories from a Reuters data set. Introduction Consider the following problem: you are given 100,000 text documents (e.g., papers, newsgroup articles, or web pages) and asked to group them into classes or into a hierarchy such that related documents are grouped together. You are not told what classes or hierarchy to use or what documents are related; you have some criteria in mind, but may not be able to say exactly w...
Clustering Based On Association Rule Hypergraphs
"... Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new metho ..."
Abstract

Cited by 88 (16 self)
 Add to MetaCart
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new methodology for clustering related items using association rules, and clustering related transactions using clusters of items. Our approach is linearly scalable with respect to the number of transactions. The frequent itemsets used to derive association rules are also used to group items into a hypergraph edge, and a hypergraph partitioning algorithm is used to find the clusters. Our experiments indicate that clustering using association rule hypergraphs holds great promise in several application domains. Our experiments with stockmarket data and congressional voting data show that this clustering scheme is able to successfully group items that belong to the same group. Clustering of items can ...
Conceptual Clustering with NumericandNominal Mixed Data  A New Similarity Based System
 in IEEE Transcript on KCE
, 1998
"... This paper presents a new Similarity Based Agglomerative Clustering(SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy[13], that gives greater weight to uncommon featurevalue matches in similarity compu ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper presents a new Similarity Based Agglomerative Clustering(SBAC) algorithm that works well for data with mixed numeric and nominal features. A similarity measure, proposed by Goodall for biological taxonomy[13], that gives greater weight to uncommon featurevalue matches in similarity computations and makes no assumptions of the underlying distributions of the featurevalues, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a concept tree, and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other schemes illustrate the superior performance of the algorithm. 1 Introduction The widespread use of computers and information technology has made extensive data collection in businesses, manufacturing, an...
A LatticeBased Approach to Hierarchical Clustering
, 2001
"... The paper presents an approach to hierarchical clustering based on the use of a least general generalization (lgg) operator to induce a lattice structure of clusters and a category utility objective function to evaluate the clustering quality. The objective function is integrated with a lattic ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The paper presents an approach to hierarchical clustering based on the use of a least general generalization (lgg) operator to induce a lattice structure of clusters and a category utility objective function to evaluate the clustering quality. The objective function is integrated with a latticebased distance measure into a bottomup control strategy for clustering. Experiments with wellknown datasets are discussed.
Semisupervised Clustering: Incorporating User Feedback to Improve
 In AAAI’00
, 2000
"... We present a new approach to clustering based on the observation that \it is easier to criticize than to construct." Our approach, which we call semisupervised clustering, allows a user to iteratively provide feedback to a clustering algorithm. ..."
Abstract
 Add to MetaCart
We present a new approach to clustering based on the observation that \it is easier to criticize than to construct." Our approach, which we call semisupervised clustering, allows a user to iteratively provide feedback to a clustering algorithm.
Identifying Qualitatively Di erent Experiences: Experiments with a Mobile Robot
"... We present an unsupervised learning method that allows a situated embodied agent to identify and represent qualitatively di erent experiences. The occurrence of events, such asthe initiation of a particular action, triggers the collection of multivariate time series of sensor values. Those time seri ..."
Abstract
 Add to MetaCart
We present an unsupervised learning method that allows a situated embodied agent to identify and represent qualitatively di erent experiences. The occurrence of events, such asthe initiation of a particular action, triggers the collection of multivariate time series of sensor values. Those time series are clustered using Dynamic Time Warping as a measure of similarity, and prototypes are extracted from the clusters. Each prototype represents a distinct experience, such as one possible outcome of engaging in an activity. Prototypes can be used for oline planning and for online prediction by nding the best partial match among the prototypes to current sensor readings. Experiments with a Pioneer1 mobile robot demonstrate the utility oftheapproach with respect to capturing the structure and dynamics of a complex, realworld environment. 1