Results 1 - 10
of
75
Data Clustering: A Review
- ACM COMPUTING SURVEYS
, 1999
"... Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exp ..."
Abstract
-
Cited by 912 (9 self)
- Add to MetaCart
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
- In Research Issues on Data Mining and Knowledge Discovery
, 1997
"... Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining ..."
Abstract
-
Cited by 70 (2 self)
- Add to MetaCart
Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency in clustering large data sets. However, working only on numeric values limits its use in data mining because data sets in data mining often contain categorical values. In this paper we present an algorithm, called k-modes, to extend the k-means paradigm to categorical domains. We introduce new dissimilarity measures to deal with categorical objects, replace means of clusters with modes, and use a frequency based method to update modes in the clustering process to minimise the clustering cost function. Tested with the well known soybean disease data set the algorithm has demonstrated a very good classification performance. Experiments on a very large health insurance data set consisting of half a million records and 34 categorical attributes show that the algorithm is scalable in terms of ...
A decision theoretic framework for approximating concepts
- International Journal of Man-machine Studies
, 1992
"... This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by ..."
Abstract
-
Cited by 27 (13 self)
- Add to MetaCart
This paper explores the implications of approximating a concept based on the Bayesian decision procedure, which provides a plausible unification of the fuzzy set and rough set approaches for approximating a concept. We show that if a given concept is approximated by one set, the same result given by the α-cut in the fuzzy set theory is obtained. On the other hand, if a given concept is approximated by two sets, we can derive both the algebraic and probabilistic rough set approximations. Moreover, based on the well known principle of maximum (minimum) entropy, we give a useful interpretation of fuzzy intersection and union. Our results enhance the understanding and broaden the applications of both fuzzy and rough sets. 1.
Measurement Of Membership Functions: Theoretical And Empirical Work
, 1995
"... This chapter presents a review of various interpretations of the fuzzy membership function together with ways of obtaining a membership function. We emphasize that different interpretations of the membership function call for different elicitation methods. We try to make this distinction clear u ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This chapter presents a review of various interpretations of the fuzzy membership function together with ways of obtaining a membership function. We emphasize that different interpretations of the membership function call for different elicitation methods. We try to make this distinction clear using techniques from measurement theory.
A Systematic Approach to the Assessment of Fuzzy Association Rules
"... In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such ru ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
In order to allow for the analysis of data sets including numerical attributes, several generalizations of association rule mining based on fuzzy sets have been proposed in the literature. While the formal specification of fuzzy associations is more or less straightforward, the assessment of such rules by means of appropriate quality measures is less obvious. Particularly, it assumes an understanding of the semantic meaning of a fuzzy rule. This aspect has been ignored by most existing proposals, which must therefore be considered as ad-hoc to some extent. In this paper, we develop a systematic approach to the assessment of fuzzy association rules. To this end, we proceed from the idea of partitioning the data stored in a database into examples of a given rule, counterexamples, and irrelevant data. Evaluation measures are then derived from the cardinalities of the corresponding subsets. The problem of finding a proper partition has a rather obvious solution for standard association rules but becomes less trivial in the fuzzy case. Our results not only provide a sound justification for commonly used measures but also suggest a means for constructing meaningful alternatives.
On Neurobiological, Neuro-Fuzzy, Machine Learning and Statistical Pattern Recognition Techniques
, 1997
"... In this paper, we propose two new neuro--fuzzy schemes, one for classification and one for clustering problems. The classification scheme is based on Simpson's Fuzzy Min Max method, and relaxes some assumptions he makes. This enables our scheme to handle mutually non exclusive classes. The neuro--fu ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In this paper, we propose two new neuro--fuzzy schemes, one for classification and one for clustering problems. The classification scheme is based on Simpson's Fuzzy Min Max method, and relaxes some assumptions he makes. This enables our scheme to handle mutually non exclusive classes. The neuro--fuzzy clustering scheme is a multiresolution algorithm that is modeled after the mechanics of human pattern recognition. We also present data from an exhaustive comparison of these techniques with neural, statistical, machine learning and other traditional approaches to pattern recognition applications. The data sets used for comparisons include those from the machine learning repository at the University of California, Irvine. We find that our proposed schemes compare quite well with the existing techniques, and in addition offer the advantages of one pass learning and on--line adaptation. Keywords--- Pattern Recognition, Classification, Clustering, Neuro-Fuzzy Systems, Multiresolution, Visi...
Mining Scientific Data
, 2001
"... The past two decades have seen rapid advances in high performance computing and ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
The past two decades have seen rapid advances in high performance computing and
Improving Clustering Technique for Functional Approximation Problem Using Fuzzy Logic: ICFA algorithm
- Lecture Notes in Computer Science
, 2005
"... Abstract—To date, clustering techniques have always been oriented to solve classification and pattern recognition problems. However, some authors have applied them unchanged to construct initial models for function approximators. Nevertheless, classification and function approximation problems prese ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Abstract—To date, clustering techniques have always been oriented to solve classification and pattern recognition problems. However, some authors have applied them unchanged to construct initial models for function approximators. Nevertheless, classification and function approximation problems present quite different objectives. Therefore it is necessary to design new clustering algorithms specialized in the problem of function approximation. This paper presents a new clustering technique, specially designed for function approximation problems, which improves the performance of the approximator system obtained, compared with other models derived from traditional classification oriented clustering algorithms and input–output clustering techniques. Index Terms—Clustering techniques, function approximation, model initialization. I.
A fuzzy linguistic methodology to deal with unbalanced linguistic term sets
- IEEE Transactions on Fuzzy Systems
, 2008
"... Abstract—Many real problems dealing with qualitative aspects use linguistic approaches to assess such aspects. In most of these problems, a uniform and symmetrical distribution of the linguistic term sets for linguistic modeling is assumed. However, there exist problems whose assessments need to be ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Abstract—Many real problems dealing with qualitative aspects use linguistic approaches to assess such aspects. In most of these problems, a uniform and symmetrical distribution of the linguistic term sets for linguistic modeling is assumed. However, there exist problems whose assessments need to be represented by means of unbalanced linguistic term sets, i.e., using term sets that are not uniformly and symmetrically distributed. The use of linguistic variables implies processes of computing with words (CW). Different computational approaches can be found in the literature to accomplish those processes. The 2-tuple fuzzy linguistic representation introduces a computational model that allows the possibility of dealing with linguistic terms in a precise way whenever the linguistic term set is uniformly and symmetrically distributed. In this paper, we present a fuzzy linguistic methodology in order to deal with unbalanced linguistic term sets. To do so, we first develop
An alternative extension of the k-means algorithm for clustering categorical data
- Int. J. Appl. Math. Comput. Sci
, 2004
"... Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computatio ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The k-means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of “cluster centers ” on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a k-means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, soybean disease and nursery databases.

