Results 1 - 10
of
19
The effect of threshold values on association rule based classification accuracy
- Journal of Data and Knowledge Engineering
, 2007
"... Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a train-ing set of previously-classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm ( ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a train-ing set of previously-classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods. We show that the accuracy can almost always be improved by a suitable choice of parameters, and describe a hill-climbing method for finding the best parameter settings. We also demonstrate that the proposed hill-climbing method is most effective when coupled with a fast CARM algorithm such as the TFPC algorithm which is also described.
Obtaining Best Parameter Values for Accurate Classification
- PROC. OF 5TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2005
"... In this paper we examine the effect that the choice of support and confidence thresholds has on the accuracy of classifiers obtained by Classification Association Rule Mining. We show that accuracy can almost always be improved by a suitable choice of threshold values, and we describe a method for f ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
In this paper we examine the effect that the choice of support and confidence thresholds has on the accuracy of classifiers obtained by Classification Association Rule Mining. We show that accuracy can almost always be improved by a suitable choice of threshold values, and we describe a method for finding the best values. We present results that demonstrate this approach can obtain higher accuracy without the need for coverage analysis of the training data.
Statistical Identification of Key Phrases for Text Classification
"... Abstract. Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly dom ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract. Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, language-independent statistical properties. We present results that demonstrate that these methods can produce good classification accuracy, with the best results being obtained using a phrase-based approach. Keywords: Text Classification, Text Preprocessing. 1
A Novel Rule Ordering Approach in Classification Association Rule Mining
- In Proceedings of the 5th International Conference on Machine Learning and Data Mining (MLDM-07
, 2007
"... Abstract. A Classification Association Rule (CAR), a common type of mined knowledge in Data Mining, describes an implicative co-occurring relationship between a set of binary-valued data-attributes (items) and a pre-defined class, expressed in the form of an “antecedent ⇒ consequent-class ” rule. Cl ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. A Classification Association Rule (CAR), a common type of mined knowledge in Data Mining, describes an implicative co-occurring relationship between a set of binary-valued data-attributes (items) and a pre-defined class, expressed in the form of an “antecedent ⇒ consequent-class ” rule. Classification Association Rule Mining (CARM) is a recent Classification Rule Mining (CRM) approach that builds an Association Rule Mining (ARM) based classifier using CARs. Regardless of which particular methodology is used to build it, a classifier is usually presented as an ordered CAR list, based on an applied rule ordering strategy. Five existing rule ordering mechanisms can be identified: (1) Confidence-Support-size_of_Antecedent (CSA), (2) size_of_Antecedent-Confidence-Support (ACS), (3) Weighted Relative Accuracy (WRA), (4) Laplace Accuracy, and (5) χ 2 Testing. In this paper, we divide the above mechanisms into two groups: (i) pure “support-confidence ” framework like, and (ii) additive score assigning like. We consequently propose a hybrid rule ordering approach by combining one approach taken from (i) and another approach taken from (ii). The experimental results show that the proposed rule ordering approach performs well with respect to the accuracy of classification.
EMADS: An Extendible Multi-Agent Data Miner
"... In this paper we describe EMADS, an Extendible Multi-Agent Data mining System. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user application and a ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper we describe EMADS, an Extendible Multi-Agent Data mining System. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user application and a research tool. This paper details the EMADS vision, the associated conceptual framework and the current implementation. Although EMADS may be applied to many data mining tasks; the study described here, for the sake of brevity, concentrates on agent based data classification. A full description of EMADS is presented.
Computer Supported Cooperative Working, distributed computation and resource
"... Abstract In this paper we describe EMADS, an Extendible Multi-Agent Data mining System. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user applicati ..."
Abstract
- Add to MetaCart
Abstract In this paper we describe EMADS, an Extendible Multi-Agent Data mining System. The EMADS vision is that of a community of data mining agents, contributed by many individuals, interacting under decentralised control to address data mining requests. EMADS is seen both as an end user application and a research tool. This paper details the EMADS vision, the associated conceptual framework and the current implementation. Although EMADS may be applied to many data mining tasks; the study described here, for the sake of brevity, concentrates on agent based data classification. A full description of EMADS is presented. Keywords Multi-Agent Data Mining (MADM), Classifier Generation.
Abstract CSCR010: Second Year Report
"... My research focuses on combining Distributed Data Mining (DDM) with MultiAgent Systems (MAS) benefiting from the possibilities offered by the MASs to improve overall DM performance. Data mining (DM) technology has emerged as a means for identifying patterns and trends from large quantities of data. ..."
Abstract
- Add to MetaCart
My research focuses on combining Distributed Data Mining (DDM) with MultiAgent Systems (MAS) benefiting from the possibilities offered by the MASs to improve overall DM performance. Data mining (DM) technology has emerged as a means for identifying patterns and trends from large quantities of data. The Data Mining technology normally adopts data integration method to generate Data warehouse, on which to gather all data into a central site, and then run an algorithm against that data to extract the useful Module Prediction and knowledge evaluation. However, a single data-mining technique has not been proven appropriate for every domain and data set. Multi-agent systems (MAS) often deal with complex applications that require distributed problem solving. In many applications the individual and collective behavior of the agents depends on the observed data from distributed sources. The field of Distributed Data Mining (DDM) deals with these challenges in analyzing distributed data and offers many algorithmic solutions to perform different data analysis and mining operations in a fundamentally distributed manner that pays careful attention to the resource constraints. Since multi-agent systems are often distributed and agents have proactive and reactive features which are very useful for Knowledge Management Systems, combining DDM with MAS for data intensive applications is appealing.
Hybrid Rule Ordering in Classification Association Rule Mining
"... Abstract. Classification Association Rule Mining (CARM) is an approach to classifier generation that builds an Association Rule Mining based classifier using Classification Association Rules (CARs). Regardless of which particular CARM algorithm is used, a similar set of CARs is always generated from ..."
Abstract
- Add to MetaCart
Abstract. Classification Association Rule Mining (CARM) is an approach to classifier generation that builds an Association Rule Mining based classifier using Classification Association Rules (CARs). Regardless of which particular CARM algorithm is used, a similar set of CARs is always generated from data, and a classifier is usually presented as an ordered list of CARs, based on a selected rule ordering strategy. Hence to produce an accurate classifier, it is essential to develop a rational rule ordering mechanism. In the past decade, a number of rule ordering strategies have been introduced. Six major ones can be identified: Confidence Support & size-of-Antecedent (CSA), size-of-
An Association Rule Miner with Semi-Autonomic Threshold Setting ∗
"... Association rule mining is well-known to depend heavily on a support threshold parameter, and on one or more thresholds for intensity of implication; among these measures, confidence is most often used and, sometimes, related alternatives such as lift, leverage, improvement, or all-confidence are em ..."
Abstract
- Add to MetaCart
Association rule mining is well-known to depend heavily on a support threshold parameter, and on one or more thresholds for intensity of implication; among these measures, confidence is most often used and, sometimes, related alternatives such as lift, leverage, improvement, or all-confidence are employed, either separately or jointly with confidence. We describe here an association mining system which requires the user to set a single parameter, of quite clear intuitive semantics, and then uses the given value to compute autonomously the corresponding thresholds for support, confidence, blocking factor (which is a slight reformulation of improvement) and confidence width (which is a complementary, recently introduced measure of novelty for association rules). We argue that the availability of one parameter is desirable; suggest a number of desiderata for the conceptual basis of such a parameter, and explain how our implementation meets them and is able to find meaningful association rules with lesser needs of domain intuition from the user. 1.
Two Measures of Objective Novelty in Association Rule Mining ⋆
"... Abstract. Association rule mining is well-known to depend heavily on a support threshold parameter, and on one or more thresholds for intensity of implication; among these measures, confidence is most often used and, sometimes, related alternatives such as lift, leverage, improvement, or all-confide ..."
Abstract
- Add to MetaCart
Abstract. Association rule mining is well-known to depend heavily on a support threshold parameter, and on one or more thresholds for intensity of implication; among these measures, confidence is most often used and, sometimes, related alternatives such as lift, leverage, improvement, or all-confidence are employed, either separately or jointly with confidence. We remain within the support-and-confidence framework in an attempt at studying complementary notions, which have the goal of measuring relative forms of objective novelty or surprisingness of each individual rule with respect to other rules that hold in the same dataset. We measure novelty through the extent to which the confidence value is robust, taken relative to the confidences of related (for instance, logically stronger) rules, as opposed to the absolute consideration of the single rule at hand. We consider two variants of this idea and analyze their logical and algorithmic properties. Since this approach has the drawback of requiring further parameters, we also propose a framework in which the

