Results 1  10
of
11
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more wellstudied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract

Cited by 489 (7 self)
 Add to MetaCart
One of the more wellstudied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chisquared test for independence from classical statistics. This leads to a measure that is upwardclosed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Multiple uses of frequent sets and condensed representations (Extended Abstract)
 In Proc. KDD Int. Conf. Knowledge Discovery in Databases
, 1996
"... In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with 0/1 values ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with 0/1 values and a threshold oe, a frequent set of r is a set X of columns of r such that at least a fraction oe of the rows of r have a 1 in all the columns of X. Finding frequent sets is a first step in finding association rules, and there exists several efficient algorithms for finding the frequent sets. We show that frequent sets have wider applications than just finding association rules. We show that using the inclusionexclusion principle one can obtain approximate confidences of arbitrary boolean rules. We derive bounds for the errors in the confidences, and show that information collected during the computation of frequent sets can also be used to provide individual error bounds for each clause...
Scalable Techniques for Mining Causal Structures
 Data Mining and Knowledge Discovery
, 1998
"... Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a st ..."
Abstract

Cited by 88 (1 self)
 Add to MetaCart
Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in largescale data mining applications with a large number of variab...
Knowledge discovery and interestingness measures: A survey
, 1999
"... Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analy ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. An important problem in the area of data mining is the development of effective measures of interestingness for ranking the discovered knowledge. In this report, we provide a general overview of the more successful and widely known data mining techniques and algorithms, and survey seventeen interestingness measures from the literature that have been successfully employed in data mining applications. 1 1
A Microeconomic View of Data Mining
, 1998
"... We present a rigorous framework, based on optimization, for evaluating data mining operations such as associations and clustering, in terms of their utility in decisionmaking. This framework leads quickly to some interesting computational problems related to sensitivity analysis, segmentation and th ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
We present a rigorous framework, based on optimization, for evaluating data mining operations such as associations and clustering, in terms of their utility in decisionmaking. This framework leads quickly to some interesting computational problems related to sensitivity analysis, segmentation and the theory of games. Department of Computer Science, Cornell University, Ithaca NY 14853. Email: kleinber@cs.cornell.edu. Supported in part by an Alfred P. Sloan Research Fellowship and by NSF Faculty Early Career Development Award CCR9701399. y Computer Science Division, Soda Hall, UC Berkeley, CA 94720. christos@cs.berkeley.edu z IBM Almaden Research Center, 650 Harry Road, San Jose CA 95120. pragh@almaden.ibm.com 1 Introduction Data mining is about extracting interesting patterns from raw data. There is some agreement in the literature on what qualifies as a "pattern" (association rules and correlations [1, 2, 3, 5, 6, 12, 20, 21] as well as clustering of the data points [9], are ...
Maintenance of Discovered Association Rules: When to update?
 In Research Issues on Data Mining and Knowledge Discovery
, 1997
"... In this paper, we devise an algorithm with which we can estimate the difference between the association rules in a database before and after it is updated. The estimated difference can be used to determine whether we update the mined association rules or not. If the estimated difference is large, th ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
In this paper, we devise an algorithm with which we can estimate the difference between the association rules in a database before and after it is updated. The estimated difference can be used to determine whether we update the mined association rules or not. If the estimated difference is large, then it is time to update the mined association rules in order to discover and learn the new rules and discard the old ones. If the estimated difference is small, then the rules in the original database is still a good approximation for those in the updated database. We do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. 1 Introduction Data mining has been attracting much attention from practitioners and researchers in recent years. Combining techniques from the fields of machine learning, statistics and databases, data mining enables us to find out usef...
Mining Patterns from Graph Traversals
 Data and Knowledge Engineering
, 2001
"... In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, we examine the problem of ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, we examine the problem of finding traversal patterns from such collections. The determination of patterns is based on the graph structure of the model. For this purpose, we present three algorithms, one which is levelwise with respect to the lengths of the patterns and two which are not. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.
Data Mining in Large Databases Using Domain Generalization Graphs
 Journal of Intelligent Information Systems
, 1999
"... Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of at ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the MultiAttribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generateandtest approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules.
 Data Mining and Knowledge Discovery
, 1998
"... By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been d ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the...
Mining Association Rules From Market Basket Data Using Share Measures And Characterized Itemsets
, 1998
"... The problem of mining association rules from market basket data has recently been an important research topic in the area of knowledge discovery from databases. It was originally introduced in [2] and studied extensively in [1, 5, 25, 26, 31, 19, 23, 29, 30, 3, 4, 33, 14]. The problem is typically e ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
The problem of mining association rules from market basket data has recently been an important research topic in the area of knowledge discovery from databases. It was originally introduced in [2] and studied extensively in [1, 5, 25, 26, 31, 19, 23, 29, 30, 3, 4, 33, 14]. The problem is typically examined in the context of discovering buying patterns from retail sales transactions. Although there are many similar data mining applications which can be modelled in this way, we again study the problem using the retail store example because of its intuitive nature and clarity. Consider a retail sales operation with a large inventory consisting of many different products. The operation is situated in a location where the customer base is socioeconomically diverse, with annual household incomes ranging from very low to very high, and demographically ranging from young families to the elderly. The sales manager has used data mining to search for association ru...