Results 1 - 10
of
18
Fast Algorithms for Mining Association Rules
, 1994
"... We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known a ..."
Abstract
-
Cited by 2159 (11 self)
- Add to MetaCart
We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving this problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.
Discovery of Multiple-Level Association Rules from Large Databases
- In Proc. 1995 Int. Conf. Very Large Data Bases
, 1995
"... Previous studies on mining association rules find rules at single concept level, however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. In this study, a top-down progressive deepening method is developed for mining mu ..."
Abstract
-
Cited by 337 (32 self)
- Add to MetaCart
Previous studies on mining association rules find rules at single concept level, however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. In this study, a top-down progressive deepening method is developed for mining multiplelevel association rules from large transaction databases by extension of some existing association rule mining techniques. A group of variant algorithms are proposed based on the ways of sharing intermediate results, with the relative performance tested on different kinds of data. Relaxation of the rule conditions for finding "level-crossing" association rules is also discussed in the paper.
Levelwise Search and Borders of Theories in Knowledge Discovery
, 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract
-
Cited by 177 (12 self)
- Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Systems for Knowledge Discovery in Databases
- IEEE Transactions On Knowledge And Data Engineering
, 1993
"... The automated discovery of knowledge in databases is becoming increasingly important as the world's wealth of data continues to grow exponentially. Knowledge-discovery systems face challenging problems from real-world databases which tend to be dynamic, incomplete, redundant, noisy, sparse, and very ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
The automated discovery of knowledge in databases is becoming increasingly important as the world's wealth of data continues to grow exponentially. Knowledge-discovery systems face challenging problems from real-world databases which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. This paper addresses these problems and describes some techniques for handling them. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research.
Methods and Problems in Data Mining
, 1997
"... Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We a ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
Knowledge discovery in databases and data mining aim at semiautomatic tools for the analysis of large data sets. We consider some methods used in data mining, concentrating on levelwise search for all frequently occurring patterns. We show how this technique can be used in various applications. We also discuss possibilities for compiling data mining queries into algorithms, and look at the use of sampling in data mining. We conclude by listing several open research problems in data mining and knowledge discovery.
TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies
, 1999
"... this paper, we also consider the approximate dependency inference task: given a relation r and a threshold #, find all minimal non-trivial approximate dependencies ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
this paper, we also consider the approximate dependency inference task: given a relation r and a threshold #, find all minimal non-trivial approximate dependencies
Efficient Discovery of Functional and Approximate Dependencies Using Partitions (Extended version)
- In ICDE
, 1997
"... Discovery of functional dependencies from relations has been identified as an important database analysis technique. In this paper, we present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The us ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Discovery of functional dependencies from relations has been identified as an important database analysis technique. In this paper, we present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The use of partitions makes the discovery of approximate functional dependencies easy and efficient, and the erroneous or exceptional rows can be identified easily. Experiments show that the new algorithm is efficient in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods. Computing Reviews Categories and Subject Descriptors: H.3.1 Content Analysis and Indexing F.2.2 Nonnumerical Algorithms and Problems I.2.6 Learning General Terms: Algorithms, Experimentation Additional Key Words and Phrases: Knowledge Discovery, Data Mining, Func...
On an algorithm for finding all interesting sentences (Extended Abstract)
- In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research
, 1996
"... Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predic ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
Knowledge discovery in databases (KDD), also called data mining, has recently received wide attention from practitioners and researchers. One of the basic problems in KDD is the following: given a data set r, a class L of sentences defining subgroups or properties of r, and an interestingness predicate, find all sentences of L deemed interesting by the interestingness predicate. In this paper we analyze a simple and well-known levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. We also consider the verification problem of a KDD process: given r and a set of sentences T ` L, determine whether T is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Mining Knowledge at Multiple Concept Levels
- In CIKM
, 1995
"... Most studies on data mining have been focused at mining rules at single concept levels, i.e., either at the primitive level or at a rather high concept level. However, it is often desirable to discover knowledge at multiple concept levels. Mining knowledge at multiple levels may help database users ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Most studies on data mining have been focused at mining rules at single concept levels, i.e., either at the primitive level or at a rather high concept level. However, it is often desirable to discover knowledge at multiple concept levels. Mining knowledge at multiple levels may help database users find some interesting rules which are difficult to be discovered otherwise and view database contents at different abstraction levels and from different angles. Methods for mining knowledge at multiple concept levels can often be developed by extension of existing data mining techniques. Moreover, for efficient processing and interactive mining of multiple-level rules, it is often necessary to adopt techniques such as step-by-step generalization/specialization or progressive deepening of a knowledge mining process. Other issues, such as visual representation of knowledge at multiple levels, and "redundant" rule filtering, should also be studied in depth.
Discovery Of Multiple-Level Rules From Large Databases
, 1996
"... With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many or ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
With the widespread computerization in business, government, and science, the efficient and effective discovery of interesting information from large databases becomes essential. Data mining or Knowledge Discovery in Database (KDD) emerges as a solution to the data analysis problems faced by many organizations. Previous studies on data mining have been focused on the discovery of knowledge at a single conceptual level, either at the primitive level or at a rather high conceptual level. However, it is often desirable to discover knowledge at multiple conceptual levels, which will provide a spectrum of understanding, from general to specific, for the underlying data. In this thesis, we first introduce the conceptual hierarchy, a hierarchical organization of the data in the databases. Two algorithms for dynamic adjustment of conceptual hierarchies are developed, as well as another algorithm for automatic generation of conceptual hierarchies for numerical attributes. In addition, a set of ...

