## Mining High-Speed Data Streams (2000)

### Cached

### Download Links

Citations: | 314 - 10 self |

### BibTeX

@INPROCEEDINGS{Domingos00mininghigh-speed,

author = {Pedro Domingos and Geoff Hulten},

title = {Mining High-Speed Data Streams},

booktitle = {},

year = {2000},

pages = {71--80},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many organizations today have more than very large databases; they have databases that grow...

### Citations

5380 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ot; or x could be a record of a cellular-telephone call, and y the decision whether it is fraudulent or not. One of the most effective and widely-used classification methods is decision tree learning =-=[1, 15]-=-. Learners of this type induce models in the form of decision trees, where each node contains a test on an attribute, each branch from a node corresponds to a possible outcome of the test, and each le... |

4369 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1983
(Show Context)
Citation Context ...ot; or x could be a record of a cellular-telephone call, and y the decision whether it is fraudulent or not. One of the most effective and widely-used classification methods is decision tree learning =-=[1, 15]-=-. Learners of this type induce models in the form of decision trees, where each node contains a test on an attribute, each branch from a node corresponds to a possible outcome of the test, and each le... |

1555 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ... recursively. 1 We solve the difficult problem of deciding exactly how many examples are necessary at each node by using a statistical result known as the Hoeffding bound (or additive Chernoff bound) =-=[7, 9]-=-. Consider a real-valued random variable r whose range is R (e.g., for a probability the range is one, and for an information gain the range is log c, where c is the number of classes). Suppose we hav... |

682 | Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
- Littlestone
- 1988
(Show Context)
Citation Context ...KDD tasks other than supervised learning have appeared in recent years (e.g., clustering [4] and association rule mining [19]). A substantial theoretical literature on online algorithms exists (e.g., =-=[8]-=-), but it focuses on weak learners (e.g., linear separators), because little can be proved about strong ones like decision trees. 6. FUTURE WORK We plan to shortly compare VFDT with SPRINT/SLIQ. VFDT ... |

393 | Sampling large databases for association rules
- Toivonen
- 1996
(Show Context)
Citation Context ... data streams. A number of efficient incremental or single-pass algorithms for KDD tasks other than supervised learning have appeared in recent years (e.g., clustering [4] and association rule mining =-=[19]-=-). A substantial theoretical literature on online algorithms exists (e.g., [8]), but it focuses on weak learners (e.g., linear separators), because little can be proved about strong ones like decision... |

272 | SPRINT: A scalable parallel classifier for data mining
- Shafer, Agrawal, et al.
- 1996
(Show Context)
Citation Context ...re produced because we are unable to take full advantage of the data. Thus the development of highly efficient algorithms becomes a priority. Currently, the most efficient algorithms available (e.g., =-=[17]-=-) concentrate on making it possible to mine databases that do not fit in main memory by only requiring sequential scans of the disk. But even these algorithms have only been tested on up to a few mill... |

206 | SLIQ: A fast scalable classifier for data mining
- Mehta, Agrawal, et al.
- 1996
(Show Context)
Citation Context ...assume that all training examples can be stored simultaneously in main memory, and are thus severely limited in the number of examples they can learn from. Disk-based decision tree learners like SLIQ =-=[10]-=- and SPRINT [17] assume the examples are stored on disk, and learn by repeatedly reading them in sequentially (effectively once per level in the tree). While this greatly increases the size of usable ... |

173 |
Simultaneous Statistical Inference
- Miller
- 1981
(Show Context)
Citation Context ...best choice is negligible. We plan to lift this assumption in future work. If the attributes at a given node are (pessimistically) assumed independent, it simply involves a Bonferroni correction to δ =-=[11]-=-. (1)served ∆G > ɛ then the Hoeffding bound guarantees that the true ∆G ≥ ∆G − ɛ > 0 with probability 1 − δ, and therefore that Xa is indeed the best attribute with probability 1 − δ. This is valid a... |

170 | Incremental induction of decision trees
- Utgoff
- 1989
(Show Context)
Citation Context ...are fulfilled by incremental learning methods (also known as online, successive or sequential methods), on which a substantial literature exists. However, the available algorithms of this type (e.g., =-=[20]-=-) have significant shortcomings from the KDD point of view. Some are reasonably efficient, but do not guarantee that the model learned will be similar to the one obtained by learning on the same data ... |

132 | Efficient algorithms for minimizing cross validation error
- Moore, Lee
- 1994
(Show Context)
Citation Context ...del for choosing the size of subsamples to use in comparing attributes. Maron and Moore [9] used Hoeffding bounds to speed selection of instance-based regression models via cross-validation (see also =-=[12]-=-). Gratch’s Sequential ID3 [6] used a statistical method to minimize the number of examples needed to choose each split in a decision tree. (Sequential ID3’s guarantees of similarity to the batch tree... |

115 | Incremental clustering for mining in a data warehousing environment
- Ester, Kriegel, et al.
- 1998
(Show Context)
Citation Context ...ble for learning from high-speed data streams. A number of efficient incremental or single-pass algorithms for KDD tasks other than supervised learning have appeared in recent years (e.g., clustering =-=[4]-=- and association rule mining [19]). A substantial theoretical literature on online algorithms exists (e.g., [8]), but it focuses on weak learners (e.g., linear separators), because little can be prove... |

107 | Hoeffding races: Accelerating model selection search for classification and function approximation
- Maron, Moore
- 1994
(Show Context)
Citation Context ... recursively. 1 We solve the difficult problem of deciding exactly how many examples are necessary at each node by using a statistical result known as the Hoeffding bound (or additive Chernoff bound) =-=[7, 9]-=-. Consider a real-valued random variable r whose range is R (e.g., for a probability the range is one, and for an information gain the range is log c, where c is the number of classes). Suppose we hav... |

105 | BOAT — optimistic decision tree construction
- Gehrke, Ganti, et al.
- 1999
(Show Context)
Citation Context ...ntial ID3’s guarantees of similarity to the batch tree were much looser than those derived here for Hoeffding trees, and it was only tested on repeatedly sampled small datasets.) Gehrke et al.’s BOAT =-=[5]-=- learned an approximate tree using a fixed-size subsample, and then refined it by scanning the full database. Provost et al. [14] studied different strategies for mining larger and larger subsamples u... |

95 | Efficient progressive sampling
- ELOMAA, Provost, et al.
- 1999
(Show Context)
Citation Context ...only tested on repeatedly sampled small datasets.) Gehrke et al.’s BOAT [5] learned an approximate tree using a fixed-size subsample, and then refined it by scanning the full database. Provost et al. =-=[14]-=- studied different strategies for mining larger and larger subsamples until accuracy (apparently) asymptotes. In contrast to systems that learn in main memory by subsampling, systems like SLIQ [10] an... |

90 |
Megainduction: Machine learning on very large databases
- Catlett
- 1991
(Show Context)
Citation Context ... to directly mine online data sources (i.e., without ever storing the examples), and to build potentially very complex trees with acceptable computational cost. We achieve this by noting with Catlett =-=[2]-=- and others that, in order to find the best attribute to test at a given node, it may be sufficient to consider only a small subset of the training examples that pass through that node. Thus, given a ... |

89 | Oversearching and layered search in empirical learning
- Quinlan
- 1995
(Show Context)
Citation Context ...esources for a massive search are available, but carrying out such a search over the small samples available (typically less than 10,000 examples) often leads to overfitting or “data dredging” (e.g., =-=[22, 16]-=-). Thus overfitting avoidance becomes the main concern, and only a fraction of the available computational power is used [3]. In contrast, in many (if not most) present-day data mining applications, t... |

88 | Organization-Based Analysis of Web-Object Sharing and Caching
- Wolman, Voelker, et al.
- 1999
(Show Context)
Citation Context .... 4.3 Web data We are currently applying VFDT to mining the stream of Web page requests emanating from the whole University of Washington main campus. The nature of the data is described in detail in =-=[23]-=-. In our experiments so far we have used a one-week anonymized trace of all the external web accesses made from the university campus. There were 23,000 active clients during this one-week trace perio... |

80 | Opus: An efficient admissible algorithm for unordered search
- Webb
- 1995
(Show Context)
Citation Context ...esources for a massive search are available, but carrying out such a search over the small samples available (typically less than 10,000 examples) often leads to overfitting or “data dredging” (e.g., =-=[22, 16]-=-). Thus overfitting avoidance becomes the main concern, and only a fraction of the available computational power is used [3]. In contrast, in many (if not most) present-day data mining applications, t... |

46 | An improved algorithm for incremental induction of decision trees
- Utgoff
- 1994
(Show Context)
Citation Context ... mentioned previously, there is a large literature on incremental learning, which space limitations preclude reviewing here. The system most closely related to ours is Utgoff’s [20] ID5R (extended in =-=[21]-=-). ID5R learns the same tree as ID3 (a batch method), by restructuring subtrees as needed. While its learning time is linear in the number of examples, it is worst-case exponential in the number of at... |

41 |
Decision theoretic subsampling for induction on large databases
- Musick, Catlett, et al.
- 1993
(Show Context)
Citation Context ...the following. Catlett [2] proposed several heuristic methods for extending RAM-based batch decisiontree learners to datasets with up to hundreds of thousands of examples. Musick, Catlett and Russell =-=[13]-=- proposed and tested (but did not implement in a learner) a theoretical model for choosing the size of subsamples to use in comparing attributes. Maron and Moore [9] used Hoeffding bounds to speed sel... |

28 | Overfitting and undercomputing in machine learning
- Dietterich
- 1995
(Show Context)
Citation Context ... 10,000 examples) often leads to overfitting or “data dredging” (e.g., [22, 16]). Thus overfitting avoidance becomes the main concern, and only a fraction of the available computational power is used =-=[3]-=-. In contrast, in many (if not most) present-day data mining applications, the bottleneck is time and memory, not examples. The latter are typically in over-supply, in the sense that it is impossible ... |

13 |
Anytime exploratory data analysis for massive data sets
- Smyth, Wolpert
- 1997
(Show Context)
Citation Context ...n the time required to read examples from disk multiple times. VFDT’s speed and anytime character make it ideal for interactive data mining; we plan to also study its application in this context (see =-=[18]-=-). Other directions for future work include: further developing the application of VFDT to Web log data; studying other applications of VFDT (e.g., intrusion detection); using nondiscretized numeric a... |

7 | Sequential inductive learning - Gratch - 1996 |