• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

AA: Understanding the Crucial Role of Attribute Interaction in Data Mining (0)

by Freitas
Venue:Artficial Intelligence Review
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 28
Next 10 →

Data Mining with an Ant Colony Optimization Algorithm

by Rafael S. Parpinelli, Heitor S. Lopes, Alex A. Freitas - IEEE Transactions on Evolutionary Computation , 2002
"... Abstract – This work proposes an algorithm for data mining called Ant-Miner (Ant Colony-based Data Miner). The goal of Ant-Miner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts and principles. ..."
Abstract - Cited by 50 (8 self) - Add to MetaCart
Abstract – This work proposes an algorithm for data mining called Ant-Miner (Ant Colony-based Data Miner). The goal of Ant-Miner is to extract classification rules from data. The algorithm is inspired by both research on the behavior of real ant colonies and some data mining concepts and principles. We compare the performance of Ant-Miner with CN2, a well-known data mining algorithm for classification, in six public domain data sets. The results provide evidence that: (a) Ant-Miner is competitive with CN2 with respect to predictive accuracy; and (b) The rule lists discovered by Ant-Miner are considerably simpler (smaller) than those discovered by CN2. Index Terms – Ant Colony Optimization, data mining, knowledge discovery, classification. I.

Quantifying and visualizing attribute interactions: An approach based on entropy

by Aleks Jakulin, Ivan Bratko - http://arxiv.org/abs/cs.AI/0308002 v3 , 2004
"... Interactions are patterns between several attributes in data that cannot be inferred from any subset of these attributes. While mutual information is a well-established approach to evaluating the interactions between two attributes, we surveyed its generalizations as to quantify interactions between ..."
Abstract - Cited by 20 (4 self) - Add to MetaCart
Interactions are patterns between several attributes in data that cannot be inferred from any subset of these attributes. While mutual information is a well-established approach to evaluating the interactions between two attributes, we surveyed its generalizations as to quantify interactions between several attributes. We have chosen McGill’s interaction information, which has been independently rediscovered a number of times under various names in various disciplines, because of its many intuitively appealing properties. We apply interaction information to visually present the most important interactions of the data. Visualization of interactions has provided insight into the structure of data on a number of domains, identifying redundant attributes and opportunities for constructing new features, discovering unexpected regularities in data, and have helped during construction of predictive models; we illustrate the methods on numerous examples. A machine learning method that disregards interactions may get caught in two traps: myopia is caused by learning algorithms assuming independence in spite of interactions, whereas fragmentation arises from assuming an interaction in spite of independence.

Genetic programming for attribute construction in data mining

by Fernando E. B. Otero, Monique M. S. Silva, Alex A. Freitas, Julio C. Nievola - Genetic Programming, Proceedings of EuroGP’2003, volume 2610 of LNCS , 2003
"... Abstract. For a given data set, its set of attributes defines its data space representation. The quality of a data space representation is one of the most important factors influencing the performance of a data mining algorithm. The attributes defining the data space can be inadequate, making it dif ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
Abstract. For a given data set, its set of attributes defines its data space representation. The quality of a data space representation is one of the most important factors influencing the performance of a data mining algorithm. The attributes defining the data space can be inadequate, making it difficult to discover highquality knowledge. In order to solve this problem, this paper proposes a Genetic Programming algorithm developed for attribute construction. This algorithm constructs new attributes out of the original attributes of the data set, performing an important preprocessing step for the subsequent application of a data mining algorithm. 1

Discovering interesting knowledge from a science & technology database with a genetic algorithm

by Wesley Romão, Alex A. Freitas, Itana M. De S. Gimenes - In Applied Soft Computing 4 , 2004
"... Data mining consists of extracting interesting knowledge from data. This paper addresses the discovery of knowledge in the form of prediction IF-THEN rules, which are a popular form of knowledge representation in data mining. In this context, we propose a Genetic Algorithm (GA) designed specifically ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Data mining consists of extracting interesting knowledge from data. This paper addresses the discovery of knowledge in the form of prediction IF-THEN rules, which are a popular form of knowledge representation in data mining. In this context, we propose a Genetic Algorithm (GA) designed specifically to discover interesting fuzzy prediction rules. The GA searches for prediction rules that are interesting in the sense of being new and surprising for the user. This is done adapting a technique little exploited in the literature, which is based on userdefined general impressions (subjective knowledge). More precisely, a prediction rule is considered interesting (or surprising) to the extent that it represents knowledge that not only was previously unknown by the user but also contradicts his original believes. In addition, the use of fuzzy logic helps to improve the comprehensibility of the rules discovered by the GA. This is due to the use of linguistic terms that are natural for the user. A prototype was implemented and applied to a real-world science & technology database, containing data about the scientific production of researchers. The GA implemented in this prototype was evaluated by comparing it with the J4.8 algorithm, a variant of the well-known C4.5 algorithm. Experiments were carried out to evaluate both the predictive accuracy and the degree of interestingness (or surprisingness) of the rules discovered by both algorithms. The predictive accuracy obtained by the proposed GA was similar to the one obtained by J4.8, but

A Hybrid Approach to Feature Selection and Generation Using an Evolutionary Algorithm

by Oliver Ritthoff, Ralf Klinkenberg, Simon Fischer, Ingo Mierswa , 2002
"... Genetic algorithms proved to work well on feature selection problems where the search space produced by the initial feature set already contains the hypothesis to be learned. In cases where this premise is not fulfilled, one needs to find or generate new features to adequately extend the search ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
Genetic algorithms proved to work well on feature selection problems where the search space produced by the initial feature set already contains the hypothesis to be learned. In cases where this premise is not fulfilled, one needs to find or generate new features to adequately extend the search space. As a solution to this representation problem we introduce a framework that combines feature selection and generation in a wrapper based approach using a modified genetic algorithm for the feature transformation and an inductive learner for the evaluation of the constructed feature set. The basic idea of this concept is to combine the positive search properties of conventional genetic algorithms with an incremental adaptation of the search space. To evaluate this hybrid feature selection and generation approach we compare it to several feature selection wrappers both on artificial and real world data.

A Review of Evolutionary Algorithms for Data Mining

by Alex A. Freitas - In: Soft Computing for Knowledge Discovery and Data Mining , 2007
"... Summary. Evolutionary Algorithms (EAs) are stochastic search algorithms inspired by the process of neo-Darwinian evolution. The motivation for applying EAs to data mining is that they are robust, adaptive search techniques that perform a global search in the solution space. This chapter first presen ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Summary. Evolutionary Algorithms (EAs) are stochastic search algorithms inspired by the process of neo-Darwinian evolution. The motivation for applying EAs to data mining is that they are robust, adaptive search techniques that perform a global search in the solution space. This chapter first presents a brief overview of EAs, focusing mainly on two kinds of EAs, viz. Genetic Algorithms (GAs) and Genetic Programming (GP). Then the chapter reviews the main concepts and principles used by EAs designed for solving several data mining tasks, namely: discovery of classification rules, clustering, attribute selection and attribute construction. Finally, it discusses Multi-Objective EAs, based on the concept of Pareto dominance, and their use in several data mining tasks.

A.A.: Towards a genetic programming algorithm for automatically evolving rule induction algorithms

by Alex A. Freitas - Proc. ECML/PKDD-2004 Workshop on Advances in Inductive Learning. (2004) 93–108
"... Abstract. Rule induction is one of the techniques most used to extract knowledge from data, since the representation of knowledge as if/then rules is very intuitive and easily understandable by problem-domain experts. Existing rule induction algorithms have been manually designed. In this era of inc ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Abstract. Rule induction is one of the techniques most used to extract knowledge from data, since the representation of knowledge as if/then rules is very intuitive and easily understandable by problem-domain experts. Existing rule induction algorithms have been manually designed. In this era of increasing automation, Genetic Programming (GP) represents a powerful tool for automatically evolving computer programs. This work proposes a genetic programming algorithm for automatically evolving rule induction algorithms. Hence, the evolved rule induction algorithm will, to a large extent, be free from the human biases that are implicitly incorporated in current manually-designed algorithms (such as the typical use of a greedy search method). This is a very ambitious, adventurous goal, which, if successful, will pave the way for a new generation of more robust, considerably less greedy rule induction algorithms. In particular, an automatically evolved rule induction algorithm can be designed to cope with attribute interaction better than current greedy rule induction algorithms, which will tend to lead to an improved performance in complex data sets. 1

Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics. In: W.B.Langdon

by Jason H. Moore, Lance W. Hahn, Marylyn D. Ritchie, Tricia A. Thornton, Bill C. White - Jonoska (eds). Proceedings of the Genetic and Evolutionary Computation Conference , 2002
"... Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and chara ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes. Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. In this paper, we present a strategy for identifying complex genetic models for simulation studies that utilizes genetic algorithms. The genetic models used in this study are penetrance functions that define the probability of disease given a specific DNA sequence variation has been inherited. We demonstrate that the genetic algorithm approach routinely identifies interesting and useful penetrance functions in a human-competitve manner. 1

New results for a hybrid decision tree/genetic algorithm for data mining

by Deborah R. Carvalho, Alex A. Freitas - Nottingham Trent Univ, UK , 2002
"... This paper addresses the well-known data mining task of discovering classification rules [5]. A classification rule is a prediction rule of the form: IF THEN . An example of a classification rule is: “IF (Age> 25) AND (Salary> $50,000) THEN (Credit = good)”. Classifi ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
This paper addresses the well-known data mining task of discovering classification rules [5]. A classification rule is a prediction rule of the form: IF <conditions> THEN <prediction (class)>. An example of a classification rule is: “IF (Age> 25) AND (Salary> $50,000) THEN (Credit = good)”. Classification rules in this format have the advantage of being intuitively comprehensible for the user,

Input variable selection: Mutual information and linear mixing measures

by Jie Ouyang, Andrew Back - IEEE Trans. on Knowledge and Data Engineering
"... Abstract — Determining the most appropriate inputs to a model has a significant impact on the performance of the model and associated algorithms for classification, prediction and data analysis. Previously we proposed an algorithm ICAIVS which utilizes independent component analysis (ICA) as a prepr ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract — Determining the most appropriate inputs to a model has a significant impact on the performance of the model and associated algorithms for classification, prediction and data analysis. Previously we proposed an algorithm ICAIVS which utilizes independent component analysis (ICA) as a preprocessing stage to overcome issues of dependencies between inputs, before the data being passed through to an inout variable selection (IVS) stage. While we demonstrated previously with artificial data that ICA can prevent an overestimation of necessary input variables, we show here that mixing between input variables is common in real world datasets so that ICA preprocessing is useful in practice. This experimental test is based on new measures introduced in this paper. Furthermore, we extend the implementation of our variable selection scheme to a statistical dependency test based on mutual information and test several algorithms on gaussian and sub-gaussian signals. Specifically, we propose a novel method of quantifying linear dependencies using ICA estimates of mixing matrices with a new Linear Mixing Measure (LMM). Index Terms — Input variable selection, modeling, data preprocessing, independent component analysis, mutual information estimation. I.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University