Results 1 -
6 of
6
Mining Quantitative Association Rules in Large Relational Tables
, 1996
"... We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by finepartitio ..."
Abstract
-
Cited by 304 (2 self)
- Add to MetaCart
We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by finepartitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset. 1 Introduction Data mining, also known as knowledge discovery in databases, has been recognized as a new area for database research. The problem of discove...
Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison
, 1995
"... The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Asso ..."
Abstract
-
Cited by 61 (0 self)
- Add to MetaCart
The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TID-list data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one datab...
Zakrzewicz M.: Itemset Materializing for Fast Mining of Association Rules
- Proc. of the 2nd ADBIS Conference
, 1998
"... Abstract. Mining association rules is an important data mining problem. Association rules are usually mined repeatedly in different parts of a database. Current algorithms for mining association rules work in two steps. First, the most frequently occurring sets of items are discovered, then the sets ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
Abstract. Mining association rules is an important data mining problem. Association rules are usually mined repeatedly in different parts of a database. Current algorithms for mining association rules work in two steps. First, the most frequently occurring sets of items are discovered, then the sets are used to generate the association rules. The first step usually requires repeated passes over the analyzed database and determines the overall performance. In this paper, we present a new method that addresses the issue of discovering the most frequently occurring sets of items. Our method consists in materializing precomputed sets of items discovered in logical database partitions. We show that the materialized sets can be repeatedly used to efficiently generate the most frequently occurring sets of items. Using this approach, required association rules can be mined with only one scan of the database. Our experiments show that the proposed method significantly outperforms the well-known algorithms. 1
Vice President of Research and Development
"... The discovery of new compounds with important biological activities is just the first step in a long road to a pharmaceutical product. Frequently, finding the new compound with interesting biological activity is the easiest part of the entire process. In most cases the new compound, itself, is not s ..."
Abstract
- Add to MetaCart
The discovery of new compounds with important biological activities is just the first step in a long road to a pharmaceutical product. Frequently, finding the new compound with interesting biological activity is the easiest part of the entire process. In most cases the new compound, itself, is not suitable for development as a therapeutic. The compound may not have the stability, selectivity, potency or any one of a myriad of other properties that are needed for a successful therapeutic. This lecture will focus on three cases illustrating different therapeutic requirements for compounds with deficiencies and how the problems were solved. The first case is a peptide (ArgGylAsp cell adhesion epitope) which had the potential as an anti-inflammatory therapeutic but it was metabolized within seconds – far too fast to be useful. The requirements here were to find a potent non-peptide organic molecule with the same receptor specificity, higher potency and longer half-life. The second case, a potent anticancer agent, doxorubicin, is in common clinical practice. Doxorubicin does not have a wide therapeutic window and its utility for cancer patients is limited by side effects (low therapeutic index). If one could find a way to give it more specificity so that only cancer cells will be affected by it, the effectiveness of the drug would be greatly improved. A prodrug of doxorubicin that is only activated by the cancer cell could
Techniques in Data Mining: Decision Trees Classification and Constraint-based Itemsets Mining
, 2001
"... Classification and Association Rules Mining are two important data mining techniques. These two techniques are complements of each other. Decision trees classification is a supervised learning that requires a training dataset to develop a classifier, while itemsets mining is an unsupervised learnin ..."
Abstract
- Add to MetaCart
Classification and Association Rules Mining are two important data mining techniques. These two techniques are complements of each other. Decision trees classification is a supervised learning that requires a training dataset to develop a classifier, while itemsets mining is an unsupervised learning that requires no apriori knowledge. Both of them are essential to practical applications. In this thesis, we aim at improving these two techniques for large databases. Classification has been widely used to assist decision making processes in various applications. Among the techniques for classification, decision tree has caught most attention recently due to its conceptual simplicity and accuracy. In the first half of this thesis, we investigate several strategies to speed up the process for building decision trees under the database oriented constraint: the main memory space is limited and usually much smaller than the dataset. Our methods for building decision trees are all based on pre-sorting. We pay particular attention to the problem of how to minimize I/O operations under the limited memory space. Our study shows that by emphasizing on ii different aspects such as the order of hashing, allocation of memory buffers, the amount of disk space, and the tradeoff between I/O and CPU costs, we can obtain schemes with different performance characteristics. Thus they can meet different requirements for different applications.
ADMiRe: An Algebraic Approach to System Performance Analysis Using Data Mining Techniques
, 2003
"... System performance analysis is a very difficult problem. Traditional tools rely on manual operations to analyze data. Consequently, determining which system resources to examine is often a lengthy process, where many problems are elusive, even when using data mining tools. We address this problem by ..."
Abstract
- Add to MetaCart
System performance analysis is a very difficult problem. Traditional tools rely on manual operations to analyze data. Consequently, determining which system resources to examine is often a lengthy process, where many problems are elusive, even when using data mining tools. We address this problem by introducing the Analyzer for Data Mining Results (ADMiRe) technique as a natural and flexible means to further interpret data mining outcome. In our scheme, regression analysis is first applied to performance data to discover correlations between parameters. Regression rules are defined to represent this output in a format suitable for ADMIRe. ADMiRe expressions are then used to manipulate these sets of rules, revealing information about combined, common and different features of varying configurations. This knowledge would be unavailable if regression output were considered in isolation. ADMiRe was tested with performance data collected from a TPC-C (Transaction Processing Performance Council) test on an Oracle database system, under various configurations, to demonstrate the effectiveness of our technique.

