## Efficient Mining of Emerging Patterns: Discovering Trends and Differences (1999)

Citations: | 245 - 31 self |

### BibTeX

@INPROCEEDINGS{Dong99efficientmining,

author = {Guozhu Dong and Jinyan Li},

title = {Efficient Mining of Emerging Patterns: Discovering Trends and Differences},

booktitle = {},

year = {1999},

pages = {43--52}

}

### Years of Citing Articles

### OpenURL

### Abstract

We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are defined as itemsets whose supports increase significantly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data classes. EPs have been proven useful: we have used them to build very powerful classifiers, which are more accurate than C4.5 and CBA, for many datasets. We believe that EPs with low to medium support, such as 1%-- 20%, can give useful new insights and guidance to experts, in even "well understood" applications. The efficient mining of EPs is a challenging problem, since (i) the Apriori property no longer holds for EPs, and (ii) there are usually too many candidates for high dimensional databases or for small support thresholds such as 0.5%. Naive algorithms are too costly. To solve this problem, (a) we promote the description of large collections of itemsets using their concise borders (the pa...

### Citations

2675 | Fast algorithms for mining association rules
- Agrawal, Srikant
- 1994
(Show Context)
Citation Context ...sets may have some nice properties that can be utilized in devising ways to efficiently process them. Previous data mining work observed and utilized some aspects of such nice properties, for example =-=[1, 3, 16]-=-. We go one step further by formalizing the notion of set intervals, defined as collections S of sets that are interval closed -- if X and Z are in S and Y is a set such that X ` Y ` Z, then Y is in S... |

1178 | Mining sequential patterns
- Agrawal, Srikant
- 1995
(Show Context)
Citation Context ...s and [8], from very different areas, share the use of the tool of borders, which indicates that these tools are really powerful. Our work is also related to the mining of regularities in time series =-=[9, 11, 18, 2, 17, 14, 4, 12]-=-. Our work is different in that we look for abnormal growth, instead of regularities. The rest of the paper is organized as follows: Section 2 formally defines the EP mining problem and gives a decomp... |

415 | Integrating Classification and Association Rule Mining
- Liu, Hsu, et al.
- 1998
(Show Context)
Citation Context ...tiating characteristics between classes of data. EPs have been 9 useful: we have used EPs to build very powerful classifiers, including the Mushroom dataset, which are more accurate than C4.5 and CBA =-=[15]-=-. We believe that they are useful in many other applications. These patterns can be large in size, and may have very small support (e.g. a trend at the forming stage). We observed that naive algorithm... |

384 | Efficiently Mining Long Patterns from Databases
- Bayardo
- 1998
(Show Context)
Citation Context ...sets may have some nice properties that can be utilized in devising ways to efficiently process them. Previous data mining work observed and utilized some aspects of such nice properties, for example =-=[1, 3, 16]-=-. We go one step further by formalizing the notion of set intervals, defined as collections S of sets that are interval closed -- if X and Z are in S and Y is a set such that X ` Y ` Z, then Y is in S... |

237 | Discovering frequent episodes in sequences - Mannila - 1995 |

212 | Levelwise search and borders of theories in knowledge discovery
- Mannila, Toivonen
- 1997
(Show Context)
Citation Context ...sets may have some nice properties that can be utilized in devising ways to efficiently process them. Previous data mining work observed and utilized some aspects of such nice properties, for example =-=[1, 3, 16]-=-. We go one step further by formalizing the notion of set intervals, defined as collections S of sets that are interval closed -- if X and Z are in S and Y is a set such that X ` Y ` Z, then Y is in S... |

127 | Efficient mining of partial periodic patterns in time series database
- Han, Dong, et al.
- 1999
(Show Context)
Citation Context ...s and [8], from very different areas, share the use of the tool of borders, which indicates that these tools are really powerful. Our work is also related to the mining of regularities in time series =-=[9, 11, 18, 2, 17, 14, 4, 12]-=-. Our work is different in that we look for abnormal growth, instead of regularities. The rest of the paper is organized as follows: Section 2 formally defines the EP mining problem and gives a decomp... |

113 |
Search through systematic set enumeration
- RYMON
- 1992
(Show Context)
Citation Context ...t these look-ahead itemsets are large, we know that all sub-itemsets of these large itemsets are large and hence there is no need to find their counts. It uses the set-enumeration trees (SE-trees) of =-=[19] as the fr-=-amework for this "look-ahead" search strategy. 4 Border-based discovery of emerging patterns Our border-based algorithms can discover all EPs in the BCDG rectangle of Figure 1, and they do t... |

81 | Cyclic Association Rules
- Ozden, Ramaswamy, et al.
- 1998
(Show Context)
Citation Context ...s and [8], from very different areas, share the use of the tool of borders, which indicates that these tools are really powerful. Our work is also related to the mining of regularities in time series =-=[9, 11, 18, 2, 17, 14, 4, 12]-=-. Our work is different in that we look for abnormal growth, instead of regularities. The rest of the paper is organized as follows: Section 2 formally defines the EP mining problem and gives a decomp... |

75 | CAEP: Classification by Aggregating Emerging Patterns
- Dong, Zhang, et al.
- 1999
(Show Context)
Citation Context ...4% 3.8% 21.4 Those EPs with very large growth rates are notable differentiating characteristics between the edible and poisonous Mushrooms, and they have been useful for building powerful classifiers =-=[7, 13]-=-. Interestingly, none of the following singleton itemsets fOdor = noneg, fGill Size = broadg, and fRing Number = oneg is an EP. Moreover, among the discovered EPs, some contain more than 8 items and t... |

60 | Exploration of the power of attributeoriented induction in data mining
- Han, Fu
- 1996
(Show Context)
Citation Context ...bout 2 28 EPs for the growth rate threshold of 2.5; these are represented by about half a million borders.) 1.3 Related work and paper organization Although EPs are also similar to discriminant rules =-=[10]-=- (assertions true on instances of a given class but untrue on other instances) and evolution rules [10] in that they are all about different datasets/classes, EPs are different because they are not li... |

52 | Mining Segment-Wise Periodic Patterns in Time Related Databases
- Han, Gong, et al.
- 1998
(Show Context)
Citation Context |

52 | Discovering trends in text databases
- Lent, Agrawal, et al.
- 1997
(Show Context)
Citation Context |

34 | Interestingness of discovered association rules in terms of neighbourhood-based unexpectedness - DONG, J - 1998 |

25 | Mining temporal relationships with multiple granularities in time sequences - Bettini, Wang, et al. - 1998 |

14 |
Stock movement and n-dimensional inter-transaction association rules
- Lu, Han, et al.
- 1998
(Show Context)
Citation Context |

9 |
Discovering jumping emerging patterns and experiments on real datasets
- Dong, Li, et al.
- 1999
(Show Context)
Citation Context ...y. The discovery of strong EPs (defined as those EPs all of whose subsets are also EPs) can be done in a way similar to Apriori, by using the subset closure property. Another method was introduced in =-=[6]-=- for discovering another special type of EPs, called jumping EPs. Jumping EPs are special EPs whose supports increase abruptly from zero support in one dataset to non-zero support in another. In that ... |

3 |
JEPClassifier: Classification by Aggregating Jumping Emerging Patterns
- Li, Dong, et al.
- 1999
(Show Context)
Citation Context ...4% 3.8% 21.4 Those EPs with very large growth rates are notable differentiating characteristics between the edible and poisonous Mushrooms, and they have been useful for building powerful classifiers =-=[7, 13]-=-. Interestingly, none of the following singleton itemsets fOdor = noneg, fGill Size = broadg, and fRing Number = oneg is an EP. Moreover, among the discovered EPs, some contain more than 8 items and t... |

1 |
Teow-Hin Ngair, and Devika Subramanian. The common order-theoretic structure of version spaces and atms's
- Gunter
- 1997
(Show Context)
Citation Context ... [16] uses borders directly to control level-wise search over the candidate space. Max-Miner [3] only uses one bound (the right-hand) of our large borders. We obtained all the results without knowing =-=[8]-=-, which is concerned with efficiency issues of ATMS (assumption-based truth maintenance system). Interestingly, that paper contained some ideas similar to ours, including the representation of interva... |