Results 1 - 10
of
101
Exploratory Mining and Pruning Optimizations of Constrained Associations Rules
, 1998
"... From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom- ings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model f ..."
Abstract
-
Cited by 245 (41 self)
- Add to MetaCart
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom- ings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraintbased, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of con- straint constructs, including domain, class, and $QL-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose constrained association queries as a means of specifying the constraints to be satisfied by the antecedent and consequent of a mined association.
Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications
- In SIGMOD
, 1998
"... Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algor ..."
Abstract
-
Cited by 101 (5 self)
- Add to MetaCart
Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR app...
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Methods and Problems in Data Mining
, 1997
"... Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We a ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.
Modeling KDD processes within the inductive database framework
- DAWAK'99
, 1999
"... One of the most challenging problems in data manipulation in the future is to be able to e ciently handle very large databases but also multiple induced properties or generalizations in that data. Popular examples of useful properties are association rules, and inclusion and functional dependencies. ..."
Abstract
-
Cited by 32 (14 self)
- Add to MetaCart
One of the most challenging problems in data manipulation in the future is to be able to e ciently handle very large databases but also multiple induced properties or generalizations in that data. Popular examples of useful properties are association rules, and inclusion and functional dependencies. Our view of a possible approach for this task is to specify and query inductive databases, which are databases that in addition to data also contain intensionally de ned generalizations about the data. We formalize this concept and show howitcan be used throughout the whole process of data mining due to the closure property of the framework. We show that simple query languages can be de ned using normal database terminology. We demonstrate the use of this framework to model typical data mining processes. It is then possible to perform various tasks on these descriptions like, e.g., optimizing the selection of interesting properties or comparing two processes.
Closed Set Based Discovery of Small Covers for Association Rules
- PROC. 15EMES JOURNEES BASES DE DONNEES AVANCEES, BDA
, 1999
"... In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since real-life databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extracti ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since real-life databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers, or bases, for exact and approximate rules. Once frequent closed itemsets which constitute a generating set for both frequent itemsets and association rules have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on real-life databases show that these algorithms are efficient and valuable in practice.
Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules
, 1999
"... Most of the previous studies on mining association rules are on mining intra-transaction associations, i.e., the associations among items within the same transaction, where the notion of the transaction could be the items bought by the same customer, the events happened on the same day, etc. In this ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Most of the previous studies on mining association rules are on mining intra-transaction associations, i.e., the associations among items within the same transaction, where the notion of the transaction could be the items bought by the same customer, the events happened on the same day, etc. In this study, we break the barrier of transactions and extend the scope of mining association rules from traditional intratransaction associations to inter-transaction associations. Mining
The 3W Model and Algebra for Unified Data Mining
, 2000
"... Real data mining/analysis applications call for a framework which adequately supports knowledge discovery as a multi-step process, where the input of one mining operation can be the output of another. Previous studies, primarily focusing on fast computation of one specific mining task at a tim ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Real data mining/analysis applications call for a framework which adequately supports knowledge discovery as a multi-step process, where the input of one mining operation can be the output of another. Previous studies, primarily focusing on fast computation of one specific mining task at a time, ignore this vital issue. Motivated by
Zakrzewicz M.: SQL-like Language for Database Mining
- ADBIS’97 Symposium
, 1997
"... Abstract Data mining, also referred to as database mining or knowl- edge discovery in databases (KDD), is a new research area that aims at the discovery of useful information from large datasets. One of the most interesting and important re- search problems is discovering of different types of rules ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
Abstract Data mining, also referred to as database mining or knowl- edge discovery in databases (KDD), is a new research area that aims at the discovery of useful information from large datasets. One of the most interesting and important re- search problems is discovering of different types of rules (e.g. association, characteristic, discriminant, etc.) from data. In this work we propose the new SQL-like language for data mining in relational databases, called MineSQL, devel- oped within the scope of the data mining research project led in Poznan University of Technology. MineSQL is the extension of industry standard SQL language developed for expressing rule queries and assisting a user in rule genera- tion, storage and retrieval. We focus on the main features of the language, its syntax and semantics, illustrated by prac- tical examples.

