## Sequential Update of ADtrees

Citations: | 1 - 0 self |

### BibTeX

@MISC{Roure_sequentialupdate,

author = {Josep Roure and Andrew W. Moore},

title = {Sequential Update of ADtrees},

year = {}

}

### OpenURL

### Abstract

Ingcreasingly, data-mining algorithms must deal with databases that continuously grow over time. These algorithms must avoid repeatedly scanning their databases. When database attributes are symbolic, ADtrees have already shown to be efficient structures to store sufficient statistics in main memory and to accelerate the mining process in batch environments. Here we present an efficient method to sequentially update ADtrees that is suitable for incremental environments. 1.

### Citations

2871 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...er 30 200 0 5000 10000 15000 20000 Num of data Sequential Update of ADtrees Similar 0 5000 10000 15000 20000 Num of data Figure 7. ADtree size. Buffering Strategy used the Adult, Connect and Covtype (=-=Blake & Mertz, 1998-=-) and Alarm (Cooper & Herskovits, 1992) datasets presented in the three different kind of orderings mentioned in Section 3.1. We tried five different orders of each type and all the results presented ... |

1079 | Bayesian method for the induction of probabilistic networks from data
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...back of this simple approach. This is further illustrated in Figure 4 where it is shown the number of ADnodes (y-axis) when ADtrees are grown with a certain amount of data (xaxis). We used the Alarm (=-=Cooper & Herskovits, 1992-=-) dataset presented in three different orders of data rows. In the first order, similar rows are presented together. In the second, dissimilar rows are presented together. In the third, rows are rando... |

120 | Cached sufficient statistics for efficient machine learning with large datasets
- Moore, Lee
- 1998
(Show Context)
Citation Context ...ning algorithms deal with datasets with symbolic attributes. Such algorithms generally perform a great number of counting queries and so they spend many time doing direct counting from data. ADtrees (=-=Moore & Lee, 1998-=-) are sparse data structures able to answer any counting query efficiently with Appearing in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006... |

40 | Optimal reinsertion: A new search operator for accelerated and more accurate Bayesian network structure learning
- Moore, Wong
- 2003
(Show Context)
Citation Context ...PA, 2006. Copyright 2006 by the author(s)/owner(s). respect to time. ADtrees have been seen to accelerate algorithms such as Bayes network learning (Teyssier & Koller, 2005; Goldenberg & Moore, 2004; =-=Moore & Wong, 2003-=-), association rules (Anderson & Moore, 1998) and feature selection (Moore & Lee, 1998). In this paper we propose an incremental version of ADtrees so that learning algorithms use them in order to avo... |

29 | Tractable learning of large bayes net structures from sparse data
- Goldenberg, Moore
- 2004
(Show Context)
Citation Context ...ine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). respect to time. ADtrees have been seen to accelerate algorithms such as Bayes network learning (Teyssier & Koller, 2005; =-=Goldenberg & Moore, 2004-=-; Moore & Wong, 2003), association rules (Anderson & Moore, 1998) and feature selection (Moore & Lee, 1998). In this paper we propose an incremental version of ADtrees so that learning algorithms use ... |

29 | Mining complex models from arbitrarily large databases in constant time
- Hulten, Domingos
- 2002
(Show Context)
Citation Context ...ansactions. In these environments, sequential or incremental learning methods become particularly relevant, since they can revise existing models efficiently. It is widely accepted in the literature (=-=Hulten & Domingos, 2002-=-) that incremental algorithms should fulfill four constraints: (1) build a model using only one scan of data, (2) require small and constant time per record, (3) require a fixed amount of memory irres... |

23 | A.: Ad-trees for fast counting and for fast learning of association rules. In: Knowledge Discovery from Databases Conference
- Anderson, Moore
- 1998
(Show Context)
Citation Context ...)/owner(s). respect to time. ADtrees have been seen to accelerate algorithms such as Bayes network learning (Teyssier & Koller, 2005; Goldenberg & Moore, 2004; Moore & Wong, 2003), association rules (=-=Anderson & Moore, 1998-=-) and feature selection (Moore & Lee, 1998). In this paper we propose an incremental version of ADtrees so that learning algorithms use them in order to avoid multiple scan of data and increasing memo... |

18 |
Order effects in incremental learning
- Langley
- 1995
(Show Context)
Citation Context ... twice the rows shown) and that the former is bigger than the latter. See also that the largest is grown with similar rows together. This behavior in incremental methods is known as ordering effects (=-=Langley, 1995-=-). That is, an algorithm may obtain different models from the same database presented in different orders. 3.2. Calculate the Actual MCVs To avoid ordering effects we present here an algorithm to obta... |

15 |
A two-sample multiple-decision procedure for ranking means of normal populations with a common unknown variance
- Bechhofer, Dunnett, et al.
- 1954
(Show Context)
Citation Context ...and the ratio between the MCV and the second MCV is larger than α. In order to theoretically justify that this approach will make the correct choice with high probability we use the following result (=-=Bechhofer et al., 1959-=-): Theorem 1 (Probability of a correct selection) Given a multinomial random variable with k values and R and any probability vector p with pk−1 < pk, the probability of a correct selection is given b... |

15 | A dynamic adaptation of AD-trees for efficient machine learning on large data sets
- Komarek, Moore
- 2000
(Show Context)
Citation Context ...ld be that if many data are buffered and many sub-ADtrees are not grown then queries would require more time. This could be solved by growing those sub-ADtrees being queried (like in dynamic ADtrees (=-=Komarek & Moore, 2000-=-)) even if the probability of being wrong is high. We could still keep the buffers, so that, when enough records were available, we could drop the ill-grown sub-ADtrees and build new ones. However, we... |

8 | A buffering strategy to avoid ordering effects in clustering
- Talavera, Roure
- 1998
(Show Context)
Citation Context ...ed and that the ADtrees built with this technique are close to those built with all the database at once. Buffering strategies have proven to work well in other fields such us incremental clustering (=-=Talavera & Roure, 1998-=-). A drawback for this buffering strategy could be that if many data are buffered and many sub-ADtrees are not grown then queries would require more time. This could be solved by growing those sub-ADt... |

4 | Incremental learning of tree augmented naive bayes classifiers - Roure - 2002 |

3 | Incremental hill-climbing search applied to Bayesian network structure learning - Roure - 2004 |

2 | An incremental algorithm for treeshaped Bayesian network learning - Roure - 2002 |

1 | Incremental augmented naive bayes classifiers - Roure |