Results 1  10
of
109
Data Clustering: A Review
 ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract

Cited by 1284 (13 self)
 Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify crosscutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
 In Research Issues on Data Mining and Knowledge Discovery
, 1997
"... Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The kmeans algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The kmeans algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining because data sets in data mining often contain categorical values. In this paper we present an algorithm, called kmodes, to extend the kmeans paradigm to categorical domains. We introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes, and use a frequency based method to update modes in the clustering process to minimise the clustering cost function. Tested with the well known soybean disease data set the algorithm has demonstrated a very good classification performance. Experiments on a very large health insurance data set consisting of half a million records and 34 categorical attributes show that the algorithm is scalable in terms of ...
A decision theoretic framework for approximating concepts
 International Journal of Manmachine Studies
, 1992
"... This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by ..."
Abstract

Cited by 36 (20 self)
 Add to MetaCart
This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by the αcut in the fuzzy set theory is obtained. On the other hand, if a given concept is approximated by two sets, we can derive both the algebraic and probabilistic rough set approximations. Moreover, based on the well known principle of maximum (minimum) entropy, we give a useful interpretation of fuzzy intersection and union. Our results enhance the understanding and broaden the applications of both fuzzy and rough sets. 1.
A systematic approach to the assessment of fuzzy association rules. Data Mining and Knowledge Discovery
, 2006
"... In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such ru ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such rules by means of appropriate quality measures is less obvious. Particularly, it assumes an understanding of the semantic meaning of a fuzzy rule. This aspect has been ignored by most existing proposals, which must therefore be considered as adhoc to some extent. In this paper, we develop a systematic approach to the assessment of fuzzy association rules. To this end, we proceed from the idea of partitioning the data stored in a database into examples of a given rule, counterexamples, and irrelevant data. Evaluation measures are then derived from the cardinalities of the corresponding subsets. The problem of finding a proper partition has a rather obvious solution for standard association rules but becomes less trivial in the fuzzy case. Our results not only provide a sound justification for commonly used measures but also suggest a means for constructing meaningful alternatives. 1.
Measurement Of Membership Functions: Theoretical And Empirical Work
, 1995
"... This chapter presents a review of various interpretations of the fuzzy membership function together with ways of obtaining a membership function. We emphasize that different interpretations of the membership function call for different elicitation methods. We try to make this distinction clear u ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
This chapter presents a review of various interpretations of the fuzzy membership function together with ways of obtaining a membership function. We emphasize that different interpretations of the membership function call for different elicitation methods. We try to make this distinction clear using techniques from measurement theory.
Improving Clustering Technique for Functional Approximation Problem Using Fuzzy Logic: ICFA algorithm
 Lecture Notes in Computer Science
, 2005
"... Abstract—To date, clustering techniques have always been oriented to solve classification and pattern recognition problems. However, some authors have applied them unchanged to construct initial models for function approximators. Nevertheless, classification and function approximation problems prese ..."
Abstract

Cited by 17 (11 self)
 Add to MetaCart
Abstract—To date, clustering techniques have always been oriented to solve classification and pattern recognition problems. However, some authors have applied them unchanged to construct initial models for function approximators. Nevertheless, classification and function approximation problems present quite different objectives. Therefore it is necessary to design new clustering algorithms specialized in the problem of function approximation. This paper presents a new clustering technique, specially designed for function approximation problems, which improves the performance of the approximator system obtained, compared with other models derived from traditional classification oriented clustering algorithms and input–output clustering techniques. Index Terms—Clustering techniques, function approximation, model initialization. I.
A systematic approach to a selfgenerating fuzzy ruletable for function approximation
 IEEE Trans Syst., Man, Cybern
, 2000
"... Abstract—In this paper, a systematic design is proposed to determine fuzzy system structure and learning its parameters, from a set of given training examples. In particular, two fundamental problems concerning fuzzy system modeling are addressed: 1) fuzzy rule parameter optimization and 2) the iden ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
Abstract—In this paper, a systematic design is proposed to determine fuzzy system structure and learning its parameters, from a set of given training examples. In particular, two fundamental problems concerning fuzzy system modeling are addressed: 1) fuzzy rule parameter optimization and 2) the identification of system structure (i.e., the number of membership functions and fuzzy rules). A fourstep approach to build a fuzzy system automatically is presented: Step 1 directly obtains the optimum fuzzy rules for a given membership function configuration. Step 2 optimizes the allocation of the membership functions and the conclusion of the rules, in order to achieve a better approximation. Step 3 determines a new and more suitable topology with the information derived from the approximation error distribution; it decides which variables should increase the number of membership functions. Finally, Step 4 determines which structure should be selected to approximate the function, from the possible configurations provided by the algorithm in the three previous steps. The results of applying this method to the problem of function approximation are presented and then compared with other methodologies proposed in the bibliography. Index Terms—Function approximation, fuzzy system construction, fuzzy system design, knowledge acquisition. I.
On neurobiological, neurofuzzy, machine learning, and statistical pattern recognition techniques
 IEEE Trans. Neural Networks
, 1997
"... Abstract — In this paper, we propose two new neurofuzzy schemes, one for classification and one for clustering problems. The classification scheme is based on Simpson’s fuzzy min–max method and relaxes some assumptions he makes. This enables our scheme to handle mutually nonexclusive classes. The n ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Abstract — In this paper, we propose two new neurofuzzy schemes, one for classification and one for clustering problems. The classification scheme is based on Simpson’s fuzzy min–max method and relaxes some assumptions he makes. This enables our scheme to handle mutually nonexclusive classes. The neurofuzzy clustering scheme is a multiresolution algorithm that is modeled after the mechanics of human pattern recognition. We also present data from an exhaustive comparison of these techniques with neural, statistical, machine learning, and other traditional approaches to pattern recognition applications. The data sets used for comparisons include those from the machine learning repository at the University of California, Irvine. We find that our proposed schemes compare quite well with the existing techniques, and in addition offer the advantages of onepass learning and online adaptation. Index Terms — Pattern recognition, classification, clustering, neurofuzzy systems, multiresolution, vision systems, overlapping
General Purpose Database Summarization
, 2005
"... In this paper, a messageoriented architecture for large database summarization is presented. ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
In this paper, a messageoriented architecture for large database summarization is presented.
Clustering Uncertain Datausing Voronoi Diagrams
"... We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdf). We show that the UKmeans algorithm, which generalises the kmeans algorithm to handle uncertain objects,isveryinefficient. Theinefficiencycomesfromthefact that UKmeans compute ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdf). We show that the UKmeans algorithm, which generalises the kmeans algorithm to handle uncertain objects,isveryinefficient. Theinefficiencycomesfromthefact that UKmeans computesexpected distances (ED) between objectsandclusterrepresentatives. Forarbitrarypdf’s,expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the numberof expected distance calculation. These techniques are analytically proven to be more effective than the basic boundingboxbasedtechniquepreviousknown in the literature. We conductexperimentsto evaluatetheeffectiveness of our pruning techniquesand to show that our techniques significantlyoutperformpreviousmethods. 1.