Results 1 -
9 of
9
An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm
, 1999
"... In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering an ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
In this paper, we aim to compare empirically four initialization methods for the K-Means algorithm: random, Forgy, MacQueen and Kaufman. Although this algorithm is known for its robustness, it is widely reported in literature that its performance depends upon two key points: initial clustering and instance order. We conduct a series of experiments to draw up (in terms of mean, maximum, minimum and standard deviation) the probability distribution of the square-error values of the final clusters returned by the K-Means algorithm independently on any initial clustering and on any instance order when each of the four initialization methods is used. The results of our experiments illustrate that the random and the Kaufman initialization methods outperform the rest of the compared methods as they make the K-Means more effective and more independent on initial clustering and on instance order. In addition, we compare the convergence speed of the K-Means algorithm when using each o...
Learning recursive Bayesian multinets for data clustering by means of constructive induction
, 2001
"... This paper introduces and evaluates a new class of knowledge model, the recursive Bayesian multinet (RBMN), which encodes the joint probability distribution of a given database. RBMNs extend Bayesian networks (BNs) as well as partitional clustering systems. Briefly, a RBMN is a decision tree with co ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
This paper introduces and evaluates a new class of knowledge model, the recursive Bayesian multinet (RBMN), which encodes the joint probability distribution of a given database. RBMNs extend Bayesian networks (BNs) as well as partitional clustering systems. Briefly, a RBMN is a decision tree with component BNs at the leaves. A RBMN is learnt using a greedy, heuristic approach akin to that used by many supervised decision tree learners, but where BNs are learnt at leaves using constructive induction. A key idea is to treat expected data as real data. This allows us to complete the database and to take advantage of a closed form for the marginal likelihood of the expected complete data that factorizes into separate marginal likelihoods for each family (a node and its parents). Our approach is evaluated on synthetic and real-world databases.
Using Fuzzy Heterogeneous Neural Networks To Learn A Model Of The Central Nervous System Control
, 1998
"... : Fuzzy heterogeneous networks based on similarity are recently introduced feed-forward neural network models composed by neurons of a general class whose inputs are mixtures of continuous (crisp and/or fuzzy) with discrete quantities, admitting also missing data. These networks have activation func ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
: Fuzzy heterogeneous networks based on similarity are recently introduced feed-forward neural network models composed by neurons of a general class whose inputs are mixtures of continuous (crisp and/or fuzzy) with discrete quantities, admitting also missing data. These networks have activation functions based on similarity relations between inputs and neuron weights. They can be coupled with classical neurons in hybrid network architectures, trained with genetic algorithms. This paper compares the effectivity of this fuzzy heterogeneous model based on similarity with the classical feed-forward one (scalar-product driven and using crisp quantities) in a time-series prediction setting. The results obtained show a remarkable increasing performance when departing from the classical neuron and a comparable one when confronted with other current powerful techniques, such as the FIR methodology. INTRODUCTION The notion of heterogeneous neurons was introduced in (Vald'es et al., 97) as a mo...
Fuzzy Heterogeneous Neurons for Imprecise Classification Problems
"... In the classical neuron model, inputs are continuous real-valued quantities. However, in many important domains from the real world, objects are described by a mixture of continuous and discrete variables, usually containing missing information and uncertainty. In this paper, a general class of neur ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In the classical neuron model, inputs are continuous real-valued quantities. However, in many important domains from the real world, objects are described by a mixture of continuous and discrete variables, usually containing missing information and uncertainty. In this paper, a general class of neuron models accepting heterogeneous inputs in the form of mixtures of continuous (crisp and/or fuzzy) and discrete quantities admitting missing data is presented. From these, several particular models can be derived as instances and different neural architectures constructed with them. Such models deal in a natural way with problems for which information is imprecise or even missing. Their possibilities in classification and diagnostic problems are here illustrated by experiments with data from a real-world domain in the field of environmental studies. These experiments show that such neurons can both learn and classify complex data very effectively in the presence of uncertain information. K...
Multiobjective Evolutionary Optimization for Visual Data Mining with Virtual Reality Spaces: Application to
"... This paper introduces a multi-objective optimization approach to the problem of computing virtual reality spaces for the visual representation of relational structures (e.g. databases), symbolic knowledge and others, in the context of visual data mining and knowledge discovery. Procedures based on e ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper introduces a multi-objective optimization approach to the problem of computing virtual reality spaces for the visual representation of relational structures (e.g. databases), symbolic knowledge and others, in the context of visual data mining and knowledge discovery. Procedures based on evolutionary computation are discussed. In particular, the NSGA-II algorithm is used as a framework for an instance of this methodology; simultaneously minimizing Sammon’s error for dissimilarity measures, and mean cross-validation error on a k-nn pattern classifier. The proposed approach is illustrated with an example from genomics (in particular, Alzheimer’s disease) by constructing virtual reality spaces resulting from multi-objective optimization. Selected solutions along the Pareto front approximation are used as nonlinearly transformed features for new spaces that compromise similarity structure preservation (from an unsupervised perspective) and class separability (from a supervised pattern recognition perspective), simultaneously. The possibility of spanning a range of solutions between these two important goals, is a benefit for the knowledge discovery and data understanding process. The quality of the set of discovered solutions is superior to the ones obtained separately, from the point of view of visual data mining.
Fuzzy Inputs and Missing Data in Similarity-Based Heterogeneous Neural Networks
- In Procs. of IWANN'99, Intl. World Conf. on Artificial and Natural Neural Networks. Accepted for Publication
, 1999
"... . Fuzzy heterogeneous networks are recently introduced neural network models composed of neurons of a general class whose inputs and weights are mixtures of continuous variables (crisp and/or fuzzy) with discrete quantities, also admitting missing data. These networks have net input functions bas ..."
Abstract
- Add to MetaCart
. Fuzzy heterogeneous networks are recently introduced neural network models composed of neurons of a general class whose inputs and weights are mixtures of continuous variables (crisp and/or fuzzy) with discrete quantities, also admitting missing data. These networks have net input functions based on similarity relations between the inputs and the weights of a neuron. They thus accept heterogeneous --possibly missing-- inputs, and can be coupled with classical neurons in hybrid network architectures, trained by means of genetic algorithms or other evolutionary methods. This paper compares the effectiveness of the fuzzy heterogeneous model based on similarity with the classical feed-forward one, in the context of an investigation in the field of environmental sciences, namely, the geochemical study of natural waters in the Arctic (Spitzbergen). Classification performance, the effect of working with crisp or fuzzy inputs, the use of traditional scalar product vs. similarity-ba...
Fuzzy Heterogeneous Neurons for Imprecise Classification Problems
"... In the classical neuron model, inputs are continuous real-valued quantities. However, in manyimportant domains from the real world, objects are described by a mixture of continuous and discrete variables, usually containing missing information and uncertainty. In this paper, a general class of neuro ..."
Abstract
- Add to MetaCart
In the classical neuron model, inputs are continuous real-valued quantities. However, in manyimportant domains from the real world, objects are described by a mixture of continuous and discrete variables, usually containing missing information and uncertainty. In this paper, a general class of neuron models accepting heterogeneous inputs in the form of mixtures of continuous (crisp and/or fuzzy) and discrete quantities admitting missing data is presented. From these, several particular models can be derived as instances and different neural architectures constructed with them. Such models deal in a natural way with problems for which information is imprecise or even missing. Their possibilities in classification and diagnostic problems are here illustrated by experiments with data from a real world domain in the field of environmental studies. These experiments show that such neurons can both learn and classify complex data very effectively in the presence of uncertain information. Ke...
Similarity-based Heterogeneous Neural Networks
"... This research introduces a general class of functions serving as generalized neuron models to be used in artificial neural networks. They are cast in the common framework of computing a similarity function, a flexible definition of a neuron as a pattern recognizer. The similarity endows the model wi ..."
Abstract
- Add to MetaCart
This research introduces a general class of functions serving as generalized neuron models to be used in artificial neural networks. They are cast in the common framework of computing a similarity function, a flexible definition of a neuron as a pattern recognizer. The similarity endows the model with a clear conceptual view and leads naturally to handle heterogeneous information, in the form of mixtures of continuous numbers (crisp or fuzzy), linguistic information and discrete quantities (ordinal, nominal and finite sets). Missing data are also explicitly considered. The absence of coding schemes and the precise computation attributed to the neurons makes the networks highly interpretable. The resulting heterogeneous neural networks are trained by means of a special-purpose genetic algorithm. The cooperative integration of different soft computing techniques (neural networks, evolutionary algorithms and fuzzy sets) makes these networks capable of learning from non-trivial data sets with a remarkable effectiveness, comparable or superior to that of classical models. This claim is demonstrated by a set of experiments on benchmarking realworld data sets.
nrc-cnrc.gc.ca
"... nrc-cnrc.gc.ca Two medical data sets (Breast cancer and Colon cancer) are investigated within a visual data mining paradigm through the unsupervised construction of virtual reality spaces using genetic programming and classical optimization (for comparison purposes). The desired visual spaces are su ..."
Abstract
- Add to MetaCart
nrc-cnrc.gc.ca Two medical data sets (Breast cancer and Colon cancer) are investigated within a visual data mining paradigm through the unsupervised construction of virtual reality spaces using genetic programming and classical optimization (for comparison purposes). The desired visual spaces are such that a modified genetic programming approach was proposed in order to generate programs representing vector functions. The extension leads to populations that are composed of forests, instead of single expression trees. No particular kind of genetic programming algorithm is required due to the generic nature of the approach taken in the paper. The results (visual spaces) show that the relationships between the data objects and their classes can be appreciated in all of the obtained spaces regardless of the mapping error. In addition, the spaces obtained with genetic programming resulted in lower mapping errors than a classical optimizer and produced relatively simple equations. Further, the set of obtained equations can be statistically analyzed in terms of the original attributes in order to further the understanding of the derivation of the new nonlinear features that are constructed. Thus, explicit mappings provided by genetic programming can be used for feature selection and generation in data mining where scalar and/or vector functions are involved.

