Results 11 - 20
of
283
A General Incremental Technique for Maintaining Discovered Association Rules
- In Proceedings of the Fifth International Conference On Database Systems For Advanced Applications
, 1997
"... A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the c ..."
Abstract
-
Cited by 110 (5 self)
- Add to MetaCart
(Show Context)
A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the case of insertion. The proposed algorithm FUP2 makes use of the previous mining result to cut down the cost of finding the new rules in an updated database. In the insertion only case, FUP2 is equivalent to FUP. In the deletion only case, FUP2 is a complementary algorithm of FUP which is very efficient when the deleted transactions is a small part of the database, which is the most applicable case. In the general case, FUP2 can efficiently update the discovered rules when new transactions are added to a transaction database, and obsolete transactions are removed from it. The proposed algorithm has been implemented and its performance is studied and compared with the best algorithms for mining...
Efficient Mining of Association Rules in Distributed Databases
, 1996
"... Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of ..."
Abstract
-
Cited by 98 (3 self)
- Add to MetaCart
Many sequential algorithms have been proposed for mining of association rules. However, very little work has been done in mining association rules in distributed databases. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In this study, an efficient algorithm, DMA, is proposed. It generates a small number of candidate sets and requires only O(n) messages for support count exchange for each candidate set, where n is the number of sites in a distributed database. The algorithm has been implemented on an experimental test bed and its performance is studied. The results show that DMA has superior performance when comparing with the direct application of a popular sequential algorithm in distributed databases.
H.: Multiple uses of frequent sets and condensed representations
- Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining
, 1996
"... Abstract In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with O ..."
Abstract
-
Cited by 98 (8 self)
- Add to MetaCart
Abstract In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with O/i vaiues and a threshoid 6, a frequent set of r is a set X of columns of r such that at least a fraction u of the rows of r have a 1 in all the columns of X. Finding frequent sets is a first step in finding association rules, and there exists several efficient algorithms for &ding the frequent sets. We show that frequent sets have wider applications than just finding association rules. We show that using the inclusion-exclusion principle one can obtain approximate confidences of arbitrary boolean rules. We derive bounds for the errors in the confidences, and show that information collected during the computation of frequent, sets can also be used to provide individual error bounds for each clause. Experiments show that this method enables one to obtain different forms -c-..I-_ c----3
Constructing Knowledge From Multivariate Spatiotemporal Data: Integrating Geographic Visualization (GVis) with Knowledge Discovery in Database (KDD) Methods
- International Journal of Geographical Information Science
, 1999
"... In this paper, we develop an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. We begin by introducing our problem context and defining both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domain ..."
Abstract
-
Cited by 85 (20 self)
- Add to MetaCart
In this paper, we develop an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. We begin by introducing our problem context and defining both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domains for methods being integrated. Next, we review and compare recent GVis and KDD developments and consider the potential for their integration, emphasizing that an iterative process with user interaction is a central focus for uncovering interesting and meaningful patterns through each. We then introduce an approach to design of an integrated GVis-KDD environment directed to exploration and discovery in the context of spatiotemporal environmental data. The approach emphasizes a matching of GVis and KDD meta-operations. Following description of the GVis and KDD methods that are linked in our prototype system, we present a demonstration of the prototype applied to a typical spatiotemporal datas...
Optimization of Constrained Frequent Set Queries with 2-variable Constraints
, 1999
"... Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, ..."
Abstract
-
Cited by 81 (13 self)
- Add to MetaCart
Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints. While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, "conventional" monotonicity-based optimization techniques do not apply effectively to 2-var constraints. The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the e ectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting.
Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison
, 1995
"... The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mini ..."
Abstract
-
Cited by 74 (0 self)
- Add to MetaCart
(Show Context)
The field of knowledge discovery in databases, or "Data Mining", has received increasing attention during recent years as large organizations have begun to realize the potential value of the information that is stored implicitly in their databases. One specific data mining task is the mining of Association Rules, particularly from retail data. The task is to determine patterns (or rules) that characterize the shopping behavior of customers from a large database of previous consumer transactions. The rules can then be used to focus marketing efforts such as product placement and sales promotions. Because early algorithms required an unpredictably large number of IO operations, reducing IO cost has been the primary target of the algorithms presented in the literature. One of the most recent proposed algorithms, called PARTITION, uses a new TID-list data representation and a new partitioning technique. The partitioning technique reduces IO cost to a constant amount by processing one datab...
Interestingness Measures for Association Patterns: A Perspective
, 2000
"... Department of Computer Science, University of Minnesota, ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
Department of Computer Science, University of Minnesota,
Mining Fuzzy Association Rules in Databases
, 1998
"... Data mining is the discovery of previously unknown, potentially useful and hidden knowledge in databases. In this paper, we concentrate on the discovery of association rules. Many algorithms have been proposed to #nd association rules in databases with binary attributes. Weintroduce the fuzzy associ ..."
Abstract
-
Cited by 71 (0 self)
- Add to MetaCart
(Show Context)
Data mining is the discovery of previously unknown, potentially useful and hidden knowledge in databases. In this paper, we concentrate on the discovery of association rules. Many algorithms have been proposed to #nd association rules in databases with binary attributes. Weintroduce the fuzzy association rules of the form, 'If X is A then Y is B', to deal with quantitative attributes. X, Y are set of attributes and A, B are fuzzy sets which describe X and Y respectively. Using the fuzzy set concept, the discovered rules are more understandable to human. Moreover, fuzzy sets handle numerical values better than existing methods because fuzzy sets soften the e#ect of sharp boundaries. 1 Introduction During the past years, boolean association rule mining has received considerable attention. Boolean association rule mining tries to #nd consumer behavior in retail data. The discovered rule can tell, for example, people buy butter and milk will also buy bread. Such rules can be used in cust...