Results 1  10
of
156
Data Mining: An Overview from Database Perspective
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract

Cited by 515 (26 self)
 Add to MetaCart
(Show Context)
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and online services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
Clausal Discovery
, 1997
"... The clausal discovery engine Claudien is presented. Claudien is an inductive logic programming engine that fits in the descriptive data mining paradigm. Claudien addresses characteristic induction from interpretations, a task which is related to existing formalisations of induction in logic. In ch ..."
Abstract

Cited by 197 (34 self)
 Add to MetaCart
The clausal discovery engine Claudien is presented. Claudien is an inductive logic programming engine that fits in the descriptive data mining paradigm. Claudien addresses characteristic induction from interpretations, a task which is related to existing formalisations of induction in logic. In characteristic induction from interpretations, the regularities are represented by clausal theories, and the data using Herbrand interpretations. Because Claudien uses clausal logic to represent hypotheses, the regularities induced typically involve multiple relations or predicates. Claudien also employs a novel declarative bias mechanism to define the set of clauses that may appear in a hypothesis.
An Algorithm for MultiRelational Discovery of Subgroups
, 1997
"... We consider the problem of finding statistically unusual subgroups in a multirelation database, and extend previous work on singlerelation subgroup discovery. We give a precise definition of the multirelation subgroup discovery task, propose a specific form of declarative bias based on foreign ..."
Abstract

Cited by 191 (8 self)
 Add to MetaCart
We consider the problem of finding statistically unusual subgroups in a multirelation database, and extend previous work on singlerelation subgroup discovery. We give a precise definition of the multirelation subgroup discovery task, propose a specific form of declarative bias based on foreign links as a means of specifying the hypothesis space, and show how propositional evaluation functions can be adapted to the multirelation setting. We then describe an algorithm for this problem setting that uses optimistic estimate and minimal support pruning, an optimal refinement operator and sampling to ensure efficiency and can easily be parallelized.
Interestingness measures for data mining: a survey
 ACM Computing Surveys
"... Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to ..."
Abstract

Cited by 151 (2 self)
 Add to MetaCart
Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
Rule Evaluation Measures: A Unifying View
 Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP99
, 1999
"... Numerous measures are used for performance evaluation in machine learning. In predictive knowledge discovery, the most frequently used measure is classi cation accuracy. With new tasks being addressed in knowledge discovery, new measures appear. In descriptive knowledge discovery, where induced rule ..."
Abstract

Cited by 93 (16 self)
 Add to MetaCart
(Show Context)
Numerous measures are used for performance evaluation in machine learning. In predictive knowledge discovery, the most frequently used measure is classi cation accuracy. With new tasks being addressed in knowledge discovery, new measures appear. In descriptive knowledge discovery, where induced rules are not primarily intended for classi cation, new measures used are novelty in clausal and subgroup discovery, and support and con dence in association rule learning. Additional measures are needed as many descriptive knowledge discovery tasks involve the induction of a large set of redundant rules and the problem is the ranking and ltering of the induced rule set. In this paper we develop a unifying view on existing measures for predictive and descriptive induction. We provide a common terminology and notation by means of contingency tables. We demonstrate how to trade o most of these measures, by using what we call weighted relative accuracy. The paper furthermore demonstrates that many rule evaluation measures developed for predictive knowledge discovery can be adapted to descriptive knowledge discovery tasks.
Subgroup Discovery with CN2SD
 Journal of Machine Learning Research
, 2004
"... discovery. The goal of subgroup discovery is to find rules describing subsets of the population that are sufficiently large and statistically unusual. The paper presents a subgroup discovery algorithm, CN2SD, developed by modifying parts of the CN2 classification rule learner: its covering algorit ..."
Abstract

Cited by 79 (12 self)
 Add to MetaCart
discovery. The goal of subgroup discovery is to find rules describing subsets of the population that are sufficiently large and statistically unusual. The paper presents a subgroup discovery algorithm, CN2SD, developed by modifying parts of the CN2 classification rule learner: its covering algorithm, search heuristic, probabilistic classification of instances, and evaluation measures. Experimental evaluation of CN2SD on 23 UCI data sets shows substantial reduction of the number of induced rules, increased rule coverage and rule significance, as well as slight improvements in terms of the area under ROC curve, when compared with the CN2 algorithm. Application of CN2SD to a large traffic accident data set confirms these findings.
Discovering significant patterns
, 2007
"... Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some userspecified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type1 error, that is, of finding patter ..."
Abstract

Cited by 59 (4 self)
 Add to MetaCart
(Show Context)
Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some userspecified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying wellestablished statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to realworld data result in large numbers of patterns that are rejected when subjected to sound statistical evaluation. They also reveal that a number of pragmatic choices about how such tests are performed can greatly affect their power.
ExpertGuided Subgroup Discovery: Methodology and Application
 Journal of Artificial Intelligence Research
, 2002
"... This paper presents an approach to expertguided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The othe ..."
Abstract

Cited by 55 (10 self)
 Add to MetaCart
This paper presents an approach to expertguided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically significant properties of selected subgroups and subgroup visualization: statistically significant properties are used to enrich the descriptions of induced subgroups, while the visualization shows subgroup properties in the form of distributions of the numbers of examples in the subgroups. The approach is illustrated by the results obtained for a medical problem of early detection of patient risk groups.
Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining
 Journal of Machine Learning Research
"... This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
(Show Context)
This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches. It also shows that various rule learning heuristics used in CSM, EPM and SD algorithms all aim at optimizing a trade off between rule coverage and precision. The commonalities (and differences) between the approaches are showcased on a selection of best known variants of CSM, EPM and SD algorithms. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods.
A Foundational Approach to Mining Itemset Utilities from Databases
 Proceedings of the Third SIAM International Conference on Data Mining
, 2004
"... Most approaches to mining association rules implicitly consider the utilities of the itemsets to be equal. We assume that the utilities of itemsets may differ, and identify the high utility itemsets based on information in the transaction database and external information about utilities. Our theore ..."
Abstract

Cited by 47 (0 self)
 Add to MetaCart
(Show Context)
Most approaches to mining association rules implicitly consider the utilities of the itemsets to be equal. We assume that the utilities of itemsets may differ, and identify the high utility itemsets based on information in the transaction database and external information about utilities. Our theoretical analysis of the resulting problem lays the foundation for future utility mining algorithms. 1