Results 1 -
9 of
9
Iterative Optimization and Simplification of Hierarchical Clusterings
- Journal of Artificial Intelligence Research
, 1995
"... Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high qual ..."
Abstract
-
Cited by 96 (1 self)
- Add to MetaCart
Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and the control strategy used to search the space of clusterings. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. In general, we cannot have it both ways, but we can partition the search so that a system inexpensively constructs a `tentative' clustering for initial examination, followed by iterative optimization, which continues to search in background for improved clusterings. Given this motivation, we evaluate an inexpensive strategy for creating initial clusterings, coupled with several control strategies for iterative optimization, each of which repeatedly modifies an initial clustering in search of a better one. One of these methods appears novel as an iterative optimization strategy in clustering contexts. Once a clustering has been construct...
Iterate: A conceptual clustering algorithm for data mining
- IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS
, 1998
"... The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ITERATE, employs: (i) a data ordering scheme and (ii) an iterative redistribution operator to produce maximally cohesive and distinct clusters. Cohesion or intra-class similarity is measured in terms of the match between individual objects and their assigned cluster prototype. Distinctness or inter-class dissimilarity is measured by an average of the variance of the distribution matchbetween clusters. We demonstrate that interpretability, from a problem solving viewpoint, is addressed by theintra- and interclass measures. Empirical results demonstrate the properties of the discovery algorithm, and its applications to problem solving.
Dependency-Based Feature Selection for Clustering Symbolic Data
, 2000
"... Feature selection is a central problem in data analysis that have received a signicant amount of attention from several disciplines, such as machine learning or pattern recognition. However, most of the research has been addressed towards supervised tasks, paying little attention to unsupervised lea ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Feature selection is a central problem in data analysis that have received a signicant amount of attention from several disciplines, such as machine learning or pattern recognition. However, most of the research has been addressed towards supervised tasks, paying little attention to unsupervised learning. In this paper, we introduce an unsupervised feature selection method for symbolic clustering tasks. Our method is based upon the assumption that, in the absence of class labels, we can deem as irrelevant those features that exhibit low dependencies with the rest of features. Experiments with several data sets demonstrate that the proposed approach is able to detect completely irrelevant features and that, additionally, it removes other features without signicantly hurting the performance of the clustering algorithm. Key words: Feature selection, clustering, data preprocessing. 1
Iterate: A conceptual clustering method for knowledge discovery in databases
- In Braunschweig, B., & Day, R. (Eds.), Innovative Applications of Artificial Intelligence in the Oil and Gas Industry
, 1995
"... ..."
Hierarchical Taxonomies using Divisive Partitioning
, 1998
"... We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many favorable properties. In particular, the algorithm shows excellent scalability to large data collections and produces high quality clusters which are competitive with other clustering methods. The alg ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many favorable properties. In particular, the algorithm shows excellent scalability to large data collections and produces high quality clusters which are competitive with other clustering methods. The algorithm yields information on the significant and distinctive words within each cluster, and these words can be inserted into the naturally occuring hierarchical structure produced by the algorithm. The result is an automatically generated hierarchical topical taxonomy of a document set. In this paper, we show how the algorithm's cost scales up linearly with the size of the data, illustrate experimentally the quality of the clusters produced, and show how the algorithm can produce a hierarchical topical taxonomy.
Preliminary System Design for an EDA Assistant
- In Preliminary Papers of the Fifth International Workshop on AI and Statistics
, 1995
"... oach with a simple example. Much of our research deals with the behavior of AI planners in demanding simulationenvironments. One such system is TransSim, a transportation planner/simulator [6]. In an early experimentwe examined the relationship between the costs of two resources, port cost (P ) and ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
oach with a simple example. Much of our research deals with the behavior of AI planners in demanding simulationenvironments. One such system is TransSim, a transportation planner/simulator [6]. In an early experimentwe examined the relationship between the costs of two resources, port cost (P ) and ship cost (S), measured over the duration of a trial. Figure 1a shows the sorted values of S for the 107 trials of the experiment, Figure 1b the relationship between P and S (denoted hP# Si.) We begin with summary statistics for the variable S (Figure 1a): the mean is about 31, the median 30, the interquartile range 9.5, and there is a slightskew toward lower values. More significantly, there are three clear gaps that separate the data into four clusters. Our preliminary partial description of S comprises the statistics and our observations about the clustering. 1 50 100 40K 5 10 20 15 a. Observation (x) vs S (y) b. P (x) vs S (y) Figure 1: Examples When we turn to the relationship h
Experiments with Domain Knowledge in Knowledge Discovery
, 1997
"... Using domain knowledge in unsupervised learning has shown to be a useful strategy when the set of examples of a given domain has not an evident structure or presents some level of noise. This background knowledge can be expressed as a set of classification rules and introduced as a semantic bias dur ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Using domain knowledge in unsupervised learning has shown to be a useful strategy when the set of examples of a given domain has not an evident structure or presents some level of noise. This background knowledge can be expressed as a set of classification rules and introduced as a semantic bias during the learning process. In this work we present some experiments on the use of partial domain knowledge with the tool LINNEO + , a conceptual clustering algorithm. The domain knowledge (or domain theory) is used to select a set of examples that will be used to start the learning process, this knowledge has not to be complete neither consistent. This bias will increase the quality of the final groups and reduce the effect of the order of the examples. Some measures of stability of classification are used. This technique is applied to identify operational situations in the functioning of an urban wastewater treatment plant. 1 Introduction The use of unsupervised learning to discover usef...
Unsupervised Learning of Spatial Regularities
- http://citeseer.nj.nec.com/1418.html. [Online]. Available: http://citeseer.nj.nec.com/1418.html
, 1995
"... This paper examines the task of remote-sensing image analysis as an unsupervised learning task. Images are usually (very) large, and represent complex objects. Unsupervised learning, or clustering, may be of great help at several phases of the analysis. First, this paper describes a clustering algor ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper examines the task of remote-sensing image analysis as an unsupervised learning task. Images are usually (very) large, and represent complex objects. Unsupervised learning, or clustering, may be of great help at several phases of the analysis. First, this paper describes a clustering algorithm. Then, the application of this algorithm to the segmentation phase is demonstrated. It is then argued that radiometry is insufficient to fully understand the scene in thematic terms. The next level of complexity is related to the incorporation of spatial information. This paper shows how this kind of data can be expressed. Clustering is then extended to deal with such complex, structured data. Experiments are provided to assess the validity of the approach. The set of experiments proves that clustering is a fundamental tool in remote-sensing image analysis, and that its scope may well be larger than was initially expected. 1. INTRODUCTION Machine learning (ML) is a sub-field of artifici...
The Evaluation and Comparative Study with a New Clustered Based Machine Learning Algorithm
, 2004
"... Abstract: In this paper, a clustering based machine learning algorithm called Clustering Algorithm System (CAS) is introduced. The CAS algorithm is tested to evaluate its performance and find fruitful results. We have been presented some heuristics to facilitate machine-learning authors to boost up ..."
Abstract
- Add to MetaCart
Abstract: In this paper, a clustering based machine learning algorithm called Clustering Algorithm System (CAS) is introduced. The CAS algorithm is tested to evaluate its performance and find fruitful results. We have been presented some heuristics to facilitate machine-learning authors to boost up their research works. The InfoBase of the Ministry of Civil Services is used to analyze the CAS algorithm. The CAS algorithm was compared with other machine learning algorithms like UNIMEM, COBWEB, and CLASSIT and was found to have some strong points over them. The proposed algorithm combined advantages of two different approaches to machine learning. The first approach is learning from examples, CAS supports single and multiple inheritance and exceptions. CAS also avoids probability assumptions which are well understood in concept formation. The second approach is learning by observation. CAS applies a set of operators that have proven to be effective in conceptual clustering. We have shown how CAS builds and searches through a clusters hierarchy to incorporate or characterize an object.

