Results 1  10
of
13
Scalable Techniques for Mining Causal Structures
 Data Mining and Knowledge Discovery
, 1998
"... Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate ..."
Abstract

Cited by 105 (1 self)
 Add to MetaCart
Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in largescale data mining applications with a large number of variab...
Multiple uses of frequent sets and condensed representations (Extended Abstract)
 In Proc. KDD Int. Conf. Knowledge Discovery in Databases
, 1996
"... In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with 0/1 values ..."
Abstract

Cited by 98 (8 self)
 Add to MetaCart
In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with 0/1 values and a threshold oe, a frequent set of r is a set X of columns of r such that at least a fraction oe of the rows of r have a 1 in all the columns of X. Finding frequent sets is a first step in finding association rules, and there exists several efficient algorithms for finding the frequent sets. We show that frequent sets have wider applications than just finding association rules. We show that using the inclusionexclusion principle one can obtain approximate confidences of arbitrary boolean rules. We derive bounds for the errors in the confidences, and show that information collected during the computation of frequent sets can also be used to provide individual error bounds for each clause...
A Microeconomic View of Data Mining
, 1998
"... We present a rigorous framework, based on optimization, for evaluating data mining operations such as associations and clustering, in terms of their utility in decisionmaking. This framework leads quickly to some interesting computational problems related to sensitivity analysis, segmentation and th ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
We present a rigorous framework, based on optimization, for evaluating data mining operations such as associations and clustering, in terms of their utility in decisionmaking. This framework leads quickly to some interesting computational problems related to sensitivity analysis, segmentation and the theory of games. Department of Computer Science, Cornell University, Ithaca NY 14853. Email: kleinber@cs.cornell.edu. Supported in part by an Alfred P. Sloan Research Fellowship and by NSF Faculty Early Career Development Award CCR9701399. y Computer Science Division, Soda Hall, UC Berkeley, CA 94720. christos@cs.berkeley.edu z IBM Almaden Research Center, 650 Harry Road, San Jose CA 95120. pragh@almaden.ibm.com 1 Introduction Data mining is about extracting interesting patterns from raw data. There is some agreement in the literature on what qualifies as a "pattern" (association rules and correlations [1, 2, 3, 5, 6, 12, 20, 21] as well as clustering of the data points [9], are ...
Knowledge discovery and interestingness measures: A survey
, 1999
"... Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analy ..."
Abstract

Cited by 50 (1 self)
 Add to MetaCart
(Show Context)
Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. An important problem in the area of data mining is the development of effective measures of interestingness for ranking the discovered knowledge. In this report, we provide a general overview of the more successful and widely known data mining techniques and algorithms, and survey seventeen interestingness measures from the literature that have been successfully employed in data mining applications. 1 1
Mining Patterns from Graph Traversals
 Data and Knowledge Engineering
, 2001
"... In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, we examine the problem of ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
(Show Context)
In data models that have graph representations, users navigate following the links of the graph structure. Conducting data mining on collected information about user accesses in such models, involves the determination of frequently occurring access sequences. In this paper, we examine the problem of finding traversal patterns from such collections. The determination of patterns is based on the graph structure of the model. For this purpose, we present three algorithms, one which is levelwise with respect to the lengths of the patterns and two which are not. Additionally, we consider the fact that accesses within patterns may be interleaved with random accesses due to navigational purposes. The definition of the pattern type generalizes existing ones in order to take into account this fact. The performance of all algorithms and their sensitivity to several parameters is examined experimentally.
Maintenance of Discovered Association Rules: When to update?
 In Research Issues on Data Mining and Knowledge Discovery
, 1997
"... In this paper, we devise an algorithm with which we can estimate the difference between the association rules in a database before and after it is updated. The estimated difference can be used to determine whether we update the mined association rules or not. If the estimated difference is large, th ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
In this paper, we devise an algorithm with which we can estimate the difference between the association rules in a database before and after it is updated. The estimated difference can be used to determine whether we update the mined association rules or not. If the estimated difference is large, then it is time to update the mined association rules in order to discover and learn the new rules and discard the old ones. If the estimated difference is small, then the rules in the original database is still a good approximation for those in the updated database. We do not have to spend the resources to update the rules. We can accumulate more updates before actually updating the rules, thereby avoiding the overheads of updating the rules too frequently. 1 Introduction Data mining has been attracting much attention from practitioners and researchers in recent years. Combining techniques from the fields of machine learning, statistics and databases, data mining enables us to find out usef...
Data Mining in Large Databases Using Domain Generalization Graphs
 Journal of Intelligent Information Systems
, 1999
"... Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of at ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the MultiAttribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generateandtest approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules.
 Data Mining and Knowledge Discovery
, 1998
"... By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been d ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling techniques to address the problem of maintaining discovered association rules. Some studies have been done on the problem of maintaining the discovered association rules when updates are made to the database. All proposed methods must examine not only the changed part but also the unchanged part in the original database, which is very large, and hence take much time. Worse yet, if the updates on the rules are performed frequently on the database but the underlying rule set has not changed much, then the effort could be mostly wasted. In this paper, we devise an algorithm which employs sampling techniques to estimate the difference between the association rules in a database before and after the database is updated. The estimated difference can be used to determine whether we should update the...
Mining Association Rules From Market Basket Data Using Share Measures And Characterized Itemsets
, 1998
"... ..."
CHAPTER 16 MINING ENCRYPTED DATA
"... Business and scientific organizations, nowadays, own databases containing confidential information that needs to be analyzed, through data mining techniques, in order to support their planning activities. The need for privacy is imposed due to, either legal restrictions (for medical and socioecono ..."
Abstract
 Add to MetaCart
Business and scientific organizations, nowadays, own databases containing confidential information that needs to be analyzed, through data mining techniques, in order to support their planning activities. The need for privacy is imposed due to, either legal restrictions (for medical and socioeconomic databases), or the unwillingness of business organizations to share their data which are considered as a valuable asset. Despite the diffusion of data mining techniques, the key problem of confidentiality has not been considered until very recently. In this chapter we address the issue of mining encrypted data, in order to both protect confidential information and to allow knowledge discovery. More specifically, we consider a scenario where a company having private databases negotiates a deal with a consultant. The company wishes the consultant to analyze its databases through data mining techniques. Yet the