@MISC{Lee_rearrangingdata, author = {Gyesung Lee}, title = {Rearranging Data Objects for Efficient and Stable Clustering}, year = {} }

Bookmark

OpenURL

Abstract

When a partitional structure is derived from a data set using a data mining algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. To overcome this problem, the first clustering process proceeds to construct an initial partition. The partition is expected to imply the possible range in the number of final clusters. We apply center sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm, REIT, that achieves both efficiency and reliability. A number of experiments have been performed to show that the algorithm helps minimize the order bias effects.

...ms caused by the order of data are well analyzed and several enhancements have been made to reduce the effect of the order problem. In this paper, we study the conceptual clustering algorithm, COBWEB =-=[4]-=-, and its order bias issues. We devise an algorithm to minimize the effect of the order bias. To evaluate the performance of the algorithm we compare it with COBWEB and another derivative of COBWEB. 2...

...al learner because when a data object is input, nodes on the tree are changed. In some cases it may change the entire structure of the tree considerably. COBWEB uses the category utility (CU) measure =-=[2]-=- as the criterion function for determining partitions in the hierarchy. CU was used as a basis for incremental clustering. However, COBWEB also has several limitations in building its clustering tree ...

...erators that this method adopted is the redistribution operator that could shift the subtree containing a set of data objects from one place of the hierarchy to another. This method is different from =-=[3]-=- in that an individual data object (rather than a group of data objects) is redistributed for a better place. Both methods look for a possible reduction of the order bias effect by redistributing wron...

...ation of bias. The choice of the evaluation function renders the difference in final outcomes due to the fact that the bias involved in the evaluation function favors toward certain types of patterns =-=[1]-=-. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial a...

...number of clusters and a larger size of clusters. This can be problematic when the size of the data is large and the number of possible clusters varies. We are now investigating the Bayesian approach =-=[5]-=- to determine the relationship between the number of clusters that best fit in the description of the set of data and the quality of clustering in terms of a specific evaluation measure. We will furth...