• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Understanding the crucial differences between classification and discovery of association rules - a position paper (0)

by A A Freitas
Venue:ACM SIGKDD Explorations
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

A survey of evolutionary algorithms for data mining and knowledge discovery

by Alex A. Freitas - In: A. Ghosh, and S. Tsutsui (Eds.) Advances in Evolutionary Computation , 2002
"... Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowled ..."
Abstract - Cited by 73 (3 self) - Add to MetaCart
Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting high-level knowledge from data. 1.

Revisiting the foundations of Artificial Immune Systems: a problem-oriented perspective

by Alex A. Freitas, Jon Timmis - Hart (Eds.) Artificial Immune Systems (Proc. ICARIS-2003), LNCS 2787 , 2003
"... This paper advocates a problem-oriented approach for the design of Artificial Immune Systems (AIS) for data mining. By problem-oriented approach we mean that, in real-world data mining applications, the design of an AIS should take into account the characteristics of the data to be mined together wi ..."
Abstract - Cited by 39 (23 self) - Add to MetaCart
This paper advocates a problem-oriented approach for the design of Artificial Immune Systems (AIS) for data mining. By problem-oriented approach we mean that, in real-world data mining applications, the design of an AIS should take into account the characteristics of the data to be mined together with the application domain: the components of the AIS – such as its representation, affinity function and immune process – should be tailored for the data and the application. This is in contrast with the majority of the literature, where a very generic AIS algorithm for data mining is developed and there is little or no concern in tailoring the components of the AIS for the data to be mined or the application domain. To support this problem-oriented approach, we provide an extensive critical review of the current literature on AIS for data mining, focusing on the data mining tasks of classification and anomaly detection. We discuss several important lessons to be taken from the natural immune system to design new AIS that are considerably more adaptive than current AIS. Finally, we conclude the paper with a summary of seven limitations of current AIS for data mining and 10 suggested research directions.

Discovering interesting knowledge from a science & technology database with a genetic algorithm

by Wesley Romão, Alex A. Freitas, Itana M. De S. Gimenes - In Applied Soft Computing 4 , 2004
"... Data mining consists of extracting interesting knowledge from data. This paper addresses the discovery of knowledge in the form of prediction IF-THEN rules, which are a popular form of knowledge representation in data mining. In this context, we propose a Genetic Algorithm (GA) designed specifically ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Data mining consists of extracting interesting knowledge from data. This paper addresses the discovery of knowledge in the form of prediction IF-THEN rules, which are a popular form of knowledge representation in data mining. In this context, we propose a Genetic Algorithm (GA) designed specifically to discover interesting fuzzy prediction rules. The GA searches for prediction rules that are interesting in the sense of being new and surprising for the user. This is done adapting a technique little exploited in the literature, which is based on userdefined general impressions (subjective knowledge). More precisely, a prediction rule is considered interesting (or surprising) to the extent that it represents knowledge that not only was previously unknown by the user but also contradicts his original believes. In addition, the use of fuzzy logic helps to improve the comprehensibility of the rules discovered by the GA. This is due to the use of linguistic terms that are natural for the user. A prototype was implemented and applied to a real-world science & technology database, containing data about the scientific production of researchers. The GA implemented in this prototype was evaluated by comparing it with the J4.8 algorithm, a variant of the well-known C4.5 algorithm. Experiments were carried out to evaluate both the predictive accuracy and the degree of interestingness (or surprisingness) of the rules discovered by both algorithms. The predictive accuracy obtained by the proposed GA was similar to the one obtained by J4.8, but

Knowledge discovery with genetic programming for providing feedback to courseware author. User Modeling and User-adapted Interaction: The

by Cristóbal Romero, Sebasti Án Ventura, Paul De Bra - Journal of Personalization Research
"... Abstract. We introduce a methodology to improve Adaptive Systems for Web-Based Education. This methodology uses evolutionary algorithms as a data mining method for discovering interesting relationships in students ’ usage data. Such knowledge may be very useful for teachers and course authors to sel ..."
Abstract - Cited by 10 (8 self) - Add to MetaCart
Abstract. We introduce a methodology to improve Adaptive Systems for Web-Based Education. This methodology uses evolutionary algorithms as a data mining method for discovering interesting relationships in students ’ usage data. Such knowledge may be very useful for teachers and course authors to select the most appropriate modifications to improve the effectiveness of the course. We use Grammar-Based Genetic Programming (GBGP) with multi-objective optimization techniques to discover prediction rules. We present a specific data mining tool that can help non-experts in data mining carry out the complete rule discovery process, and demonstrate its utility by applying it to an adaptive Linux course that we developed. Key words. adaptive system for web-based education, data mining, evolutionary algorithms, grammar-based genetic programming, prediction rules

A distributed-population genetic algorithm for discovering interesting prediction rules

by Edgar Noda, Alex A. Freitas, Akebo Yamakami - In Proceedings of the 7th Online World Conference on Soft Computing in Industrial Applications , 2002
"... In data mining the quality of prediction rules basically involves three criteria: accuracy, comprehensible and interestingness. The majority of the rule induction literature focuses on discovering accurate, comprehensible rules. In this paper we also take these two criteria into account, but we go b ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
In data mining the quality of prediction rules basically involves three criteria: accuracy, comprehensible and interestingness. The majority of the rule induction literature focuses on discovering accurate, comprehensible rules. In this paper we also take these two criteria into account, but we go beyond them in the sense that we aim at discovering rules that are interesting (surprising) for the user. The search is performed by a distributed genetic algorithm (DGA) specifically designed for the discovery of interesting rules. DGAs constitute an interesting approach to tackle the premature convergence problem in evolutionary algorithms. In our approach the partition of the search space in semi-isolated subpopulations (demes) represents a subdivision of the task. We model the migration procedure of DGAs as an explicit means to promote cooperation among the demes. The algorithm addresses the dependence modeling task of data mining, where different rules can predict different goal attributes. This task can be regarded as a generalization of the very well known classification task, where all rules predict the same goal attribute. This paper also compares the results of the DGA with the results of a single population genetic algorithm to discover interesting rules. 1

A Multiview Approach for Intelligent Data Analysis based on Data Operators

by Yaohua Chen, Yiyu Yao , 2008
"... Multiview intelligent data analysis explores data from different perspectives to reveal various types of structures and knowledge embedded in the data. Each view may capture a specific aspect of the data and hence satisfy the needs of a particular group of users. Collectively, multiple views provide ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Multiview intelligent data analysis explores data from different perspectives to reveal various types of structures and knowledge embedded in the data. Each view may capture a specific aspect of the data and hence satisfy the needs of a particular group of users. Collectively, multiple views provide a comprehensive description and understanding of the data. In this paper, we propose a multiview framework of intelligent data analysis based on modal-style data operators. The classes of the data operators include basic set assignment, sufficiency, dual sufficiency, necessity and possibility operators. They demonstrate various types of data relationships and characterize various features and granulated views of the data. It is shown that different structures of the data can also be constructed based on the different data operators.

Contribution of statistical learning to validation of association rules

by Olivier Teytaud, S. Lallich, Proceedings Of Cap, Olivier Teytaud, Stephane Lallich , 2001
"... Many measures aim at evaluating the interest of association rules. The subject of this article is the detailed study of confidence intervals associated to the evaluation of these measures. The following difficulties arise: Samples being finite, we restrict our attention to non-asymptotic bound ..."
Abstract - Add to MetaCart
Many measures aim at evaluating the interest of association rules. The subject of this article is the detailed study of confidence intervals associated to the evaluation of these measures. The following difficulties arise: Samples being finite, we restrict our attention to non-asymptotic bounds. The number of tested rules can be large. So, it is not statistically possible to treat the rules separately: risks accumulate and one could thus "validate" absurd rules. We do not only work on rules without exception; rules with confidence lower than 1 can be important. The solution we propose is based upon VC-dimension, a classical tool of learning theory. Keywords: Validation, Rule extraction, Databases, Uniform nonasymptotic bounds, learning theory, VC-theory. Contents 1

A Parameter-Free Associative Classification Method

by Loïc Cerf, Dominique Gay, Nazha Selmaoui, Jean-françois Boulicaut
"... Abstract. In many application domains, classification tasks have to tackle multiclass imbalanced training sets. We have been looking for a CBA approach (Classification Based on Association rules) in such difficult contexts. Actually, most of the CBA-like methods are one-vs-all approaches (OVA), i.e. ..."
Abstract - Add to MetaCart
Abstract. In many application domains, classification tasks have to tackle multiclass imbalanced training sets. We have been looking for a CBA approach (Classification Based on Association rules) in such difficult contexts. Actually, most of the CBA-like methods are one-vs-all approaches (OVA), i.e., selected rules characterize a class with what is relevant for this class and irrelevant for the union of the other classes. Instead, our method considers that a rule has to be relevant for one class and irrelevant for every other class taken separately. Furthermore, a constrained hill climbing strategy spares users tuning parameters and/or spending time in tedious post-processing phases. Our approach is empirically validated on various benchmark data sets. Keywords: Classification, Association Rules, Parameter Tuning, Multiclass. 1

1 School of Informatics and Engineering,

by Anna Shillabeer, John F. Roddick, Adelaide Sa
"... Medical science has a long history characterised by incidents of extraordinary insights that have resulted in a paradigm shift in the methodologies and approaches used and have moved the discipline forward. While knowledge discovery has much to offer medicine, it cannot be done in ignorance of eithe ..."
Abstract - Add to MetaCart
Medical science has a long history characterised by incidents of extraordinary insights that have resulted in a paradigm shift in the methodologies and approaches used and have moved the discipline forward. While knowledge discovery has much to offer medicine, it cannot be done in ignorance of either this history or the norms of modern medical investigation. This paper explores the lineage of medical knowledge acquisition and discusses the adverse perceptions that data mining techniques will have to surmount to gain acceptance. Keywords: Medical and Health Data Mining. 1

A Critical Review of Rule Surprisingness

by D. R. Carvalho, A. A. Freitas, N. F. F. Ebecken
"... In data mining it is usually desirable that discovered knowledge have some characteristics such as being as accurate as possible, comprehensible and surprising to the user. The vast majority of data mining algorithms produce, as part of their results, information of a statistical nature that allows ..."
Abstract - Add to MetaCart
In data mining it is usually desirable that discovered knowledge have some characteristics such as being as accurate as possible, comprehensible and surprising to the user. The vast majority of data mining algorithms produce, as part of their results, information of a statistical nature that allows the user to assess how accurate and reliable the discovered knowledge is. However, in many cases this is not enough for the user. Even if discovered knowledge is highly accurate from a statistical point of view, it might not be interesting for the user. Few data mining algorithms produce, as part of their results, a measure of the degree of surprisingness of discovered knowledge. However, these measures can be computed in a post-processing phase, as a form of additional evaluation of the quality of discovered knowledge, complementing (rather than replacing) statistical measures of discovered knowledge accuracy. This papers presents a review of four measures of classification-rule surprisingness, discussing their main characteristics, advantages and disadvantages. Hence, the main contribution of this paper is to improve our understanding of these rule surprisingness measures, which is a step towards solving the very difficult problem of selecting the “best ” rule surprisingness measure for a given application domain.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University