Results 11  20
of
1,182
Discovering Generalized Episodes Using Minimal Occurrences
 In KDD ’96: Proc. 2nd International Conference on Knowledge Discovery and Data Mining
, 1996
"... Sequences of events are an important special form of data that arises in several contexts, including telecommunications, user interface studies, and epidemiology. We present a general and flexible framework of specifying classes of generalized episodes. These are recurrent combinations of events sat ..."
Abstract

Cited by 148 (8 self)
 Add to MetaCart
(Show Context)
Sequences of events are an important special form of data that arises in several contexts, including telecommunications, user interface studies, and epidemiology. We present a general and flexible framework of specifying classes of generalized episodes. These are recurrent combinations of events satisfying certain conditions. The framework can be instantiated to a wide variety of applications by selecting suitable primitive conditions. We present algorithms for discovering frequently occurring episodes and episode rules. The algorithms are based on the use of minimal occurrences of episodes; this makes it possible to evaluate confidences of a wide variety of rules using only a single analysis pass. We present empirical results on t,he behavior of t.he algorithms on events stemming from a WWW log.
PincerSearch: A New Algorithm for Discovering the Maximum Frequent Set
 In 6th Intl. Conf. Extending Database Technology
, 1997
"... Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottomup breadthfirst search direction. The computation starts from f ..."
Abstract

Cited by 127 (2 self)
 Add to MetaCart
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottomup breadthfirst search direction. The computation starts from frequent 1itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottomup and topdown directions. The main search direction is still bottomup but a restricted search is conducted in the topdown direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidat...
Data Mining for Path Traversal Patterns in a Web Environment
, 1996
"... In this paper, we explore a new data mining capability which involves mining path traversal patterns in a distributed information providing environment like worldwideweb. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some ..."
Abstract

Cited by 125 (1 self)
 Add to MetaCart
In this paper, we explore a new data mining capability which involves mining path traversal patterns in a distributed information providing environment like worldwideweb. First, we convert the original sequence of log data into a set of maximal forward references and filter out the effect of some backward references which are mainly made for ease of traveling. Second, we derive algorithms to determine the frequent traversal patterns, i.e., large reference sequences, from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences: one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed.
Synopsis Data Structures for Massive Data Sets
"... Abstract. Massive data sets with terabytes of data are becoming commonplace. There is an increasing demand for algorithms and data structures that provide fast response times to queries on such data sets. In this paper, we describe a context for algorithmic work relevant to massive data sets and a f ..."
Abstract

Cited by 117 (13 self)
 Add to MetaCart
Abstract. Massive data sets with terabytes of data are becoming commonplace. There is an increasing demand for algorithms and data structures that provide fast response times to queries on such data sets. In this paper, we describe a context for algorithmic work relevant to massive data sets and a framework for evaluating such work. We consider the use of "synopsis" data structures, which use very little space and provide fast (typically approximated) answers to queries. The design and analysis of effective synopsis data structures o er many algorithmic challenges. We discuss a number of concrete examples of synopsis data structures, and describe fast algorithms for keeping them uptodate in the presence of online updates to the data sets.
Mining Frequent Patterns with Counting Inference
 Sigkdd Explorations
, 2000
"... ACB(D,?E= A&F"=@F"<G?8&:H?E>CI J"FCA; 8:HKMLONQPR1NQSEDT:H; U:V; W 8GA&F XHYHU?</>Z71FC["?I\F"= 8; K]; ^>C8&; F"7VF*_8&:1?`D?I I W ab71FDc7d>*I J"F*A&; 8&:1K e = A&; F*A&;gfih:1; <F"= 8; K]; ^> ..."
Abstract

Cited by 112 (9 self)
 Add to MetaCart
(Show Context)
ACB(D,?E= A&F"=@F"<G?8&:H?E>CI J"FCA; 8:HKMLONQPR1NQSEDT:H; U:V; W 8GA&F XHYHU?</>Z71FC["?I\F"= 8; K]; ^>C8&; F"7VF*_8&:1?`D?I I W ab71FDc7d>*I J"F*A&; 8&:1K e = A&; F*A&;gfih:1; <F"= 8; K]; ^>C8&; F"7; <j1>*<G?XF"7E>.7H?Dk<G8GA>C8&?J*lU>*I I ?X mHn*o opqrks&t*u rHogv r wxv rCypqpr@sp 8:1>C8TA?I ; ?<.F*7z8&:1?/UF"7HU?=H8{F*_c p} mHn*o opqrH~ f?9<G:1FD8&:@>C8]8&:H?9<GY1=H=(FCA&8xFC_`_ A?Y1?78x71F*7HWa?l =1>C8G8?A&7H<U>C7j@?x; _ ?AGA&?X_ AF*KM_ A&?bYH?7b8a?l=1>C8G8&?A&71<`DT; 8&: W F"Y 8E>*UU?<G<&; 71J98:H?ZX1>8>Cj@>C<&?"f\H=@?A&; KE?7b8&<`UF"KE=1>CA&; 71JLNP R1NS/8&F8&:1?T8: A&??`>*I J"F*A&; 8&:HK]< e = A&; F*A&;gB@,I F*<&?`>*71Xzz>CbWGZ; 71?AB <G:1FD8&:@>C8xLNQPR1NS; <]>*KEF"7HJ8&:1?ZKEF"<8?EU; ?7b8]>CI J"F*A&; 8&:HK]< _ FCA{KE; 7H; 71J`_ A?Y1?78T=1>C8G8?A&7H<f 1.
A General Incremental Technique for Maintaining Discovered Association Rules
 In Proceedings of the Fifth International Conference On Database Systems For Advanced Applications
, 1997
"... A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the c ..."
Abstract

Cited by 109 (5 self)
 Add to MetaCart
(Show Context)
A more general incremental updating technique is developed for maintaining the association rules discovered in a database in the cases including insertion, deletion, and modification of transactions in the database. A previously proposed algorithm FUP can only handle the maintenance problem in the case of insertion. The proposed algorithm FUP2 makes use of the previous mining result to cut down the cost of finding the new rules in an updated database. In the insertion only case, FUP2 is equivalent to FUP. In the deletion only case, FUP2 is a complementary algorithm of FUP which is very efficient when the deleted transactions is a small part of the database, which is the most applicable case. In the general case, FUP2 can efficiently update the discovered rules when new transactions are added to a transaction database, and obsolete transactions are removed from it. The proposed algorithm has been implemented and its performance is studied and compared with the best algorithms for mining...
Cyclic association rules
 In Proc. 1998 Int. Conf. Data Engineering (ICDE'98
, 1998
"... ..."
(Show Context)
Scalable Techniques for Mining Causal Structures
 Data Mining and Knowledge Discovery
, 1998
"... Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate ..."
Abstract

Cited by 106 (1 self)
 Add to MetaCart
Mining for association rules in market basket data has proved a fruitful area of research. Measures such as conditional probability (confidence) and correlation have been used to infer rules of the form "the existence of item A implies the existence of item B." However, such rules indicate only a statistical relationship between A and B. They do not specify the nature of the relationship: whether the presence of A causes the presence of B, or the converse, or some other attribute or phenomenon causes both to appear together. In applications, knowing such causal relationships is extremely useful for enhancing understanding and effecting change. While distinguishing causality from correlation is a truly difficult problem, recent work in statistics and Bayesian learning provide some avenues of attack. In these fields, the goal has generally been to learn complete causal models, which are essentially impossible to learn in largescale data mining applications with a large number of variab...
Mining Minimal NonRedundant Association Rules using Frequent Closed Itemsets
, 2000
"... The problem of the relevance and the usefulness of extracted association rules is of primary importance because, in the majority of cases, reallife databases lead to several thousands association rules with high condence and among which are many redundancies. Using the closure of the Galois con ..."
Abstract

Cited by 105 (10 self)
 Add to MetaCart
The problem of the relevance and the usefulness of extracted association rules is of primary importance because, in the majority of cases, reallife databases lead to several thousands association rules with high condence and among which are many redundancies. Using the closure of the Galois connection, we dene two new bases for association rules which union is a generating set for all valid association rules with support and condence. These bases are characterized using frequent closed itemsets and their generators; they consist of the nonredundant exact and approximate association rules having minimal antecedents and maximal consequents, i.e. the most relevant association rules. Algorithms for extracting these bases are presented and results of experiments carried out on reallife databases show that the proposed bases are useful, and that their generation is not time consuming.
Freesets: a condensed representation of Boolean data for the approximation of frequency queries
 Data Mining and Knowledge Discovery
, 2003
"... Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can ..."
Abstract

Cited by 105 (20 self)
 Add to MetaCart
(Show Context)
Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ɛadequate representations (H. Mannila and H. Toivonen, 1996. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 189–194). We show that frequent freesets can be efficiently extracted using pruning strategies developed for frequent itemset discovery, and that they can be used to approximate the support of any frequent itemset. Experiments on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemset extraction. Furthermore, the experiments show that the extraction of frequent freesets is still possible when the extraction of frequent itemsets becomes intractable, and that the supports of the frequent freesets can be used to approximate very closely the supports of the frequent itemsets. Finally, we consider the effect of this approximation on association rules (a popular kind of patterns that can be derived from frequent itemsets) and show that the corresponding errors remain very low in practice.