## Efficient Progressive Sampling (1999)

### Cached

### Download Links

- [www-eksl.cs.umass.edu]
- [eksl.isi.edu]
- [crue.isi.edu]
- [w3.sista.arizona.edu]
- [w3.sista.arizona.edu]
- [eksl-www.cs.umass.edu]
- [www-eksl.cs.umass.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IN PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING |

Citations: | 98 - 9 self |

### BibTeX

@INPROCEEDINGS{Provost99efficientprogressive,

author = {Foster Provost and David Jensen and Tim Oates},

title = {Efficient Progressive Sampling},

booktitle = {IN PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING},

year = {1999},

pages = {23--32},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive sampling--- using progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient.

### Citations

5494 |
C4.5: Programs for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...un-time complexity (in n) of the underlying induction algorithm. Run-time complexity models are not always easy to obtain. For example, our empirical results below use the decisiontree algorithm C4.5 =-=[17]-=-, for which reported time complexity varies widely. Moreover, DP sampling requires the actual runtime complexity for the problem in question, rather than a worst-case complexity. We obtained empirical... |

3110 |
UCI repository of machine learning databases
- Blake, Keogh, et al.
- 1998
(Show Context)
Citation Context ...n, and low values for very large n. For example, data from Oates and Jensen [12, 13] show that the distribution of the number of instances needed for convergence over a large set of the UCI databases =-=[10]-=- is roughly log-normal. Given such expectations about nmin , is it possible to construct the schedule with the minimum expected cost of convergence? This seems a daunting task. For each value of n, fr... |

2953 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context .... For each value of n, from 1 to N , a model can either be built or not, leading to 2 N possible schedules. However, identification of the optimal schedule can be cast in terms of dynamic programming =-=[1]-=-, yielding an algorithm that requires O(N 2 ) space and O(N 3 ) time. Let f(n) be the cost of building a model with n instances and determining whether accuracy has converged. As described above, let ... |

1761 | A theory of learnable
- Valiant
- 1984
(Show Context)
Citation Context ...t. In their paper on arithmetic sampling, John and Langley [8] model the learning curve as sampling progresses. They determine convergence using a stopping criterion modeled after the work of Valiant =-=[18]-=-. Specifically, convergence is reached when P r((acc(N) \Gamma acc(n i )) ? ffl)sffi, where acc(x) is the accuracy of the model that an algorithm produces after seeing x instances, ffl refers to the m... |

331 |
Learning efficient classification procedures and their application to chess end games
- Quinlan
- 1983
(Show Context)
Citation Context ...died the general case. Methods for active sampling, choosing subsequent samples based upon the models learned previously, are of particular interest. A classic example of active sampling is windowing =-=[16]-=-, wherein subsequent sampling chooses instances for which the current model makes errors. Active sampling changes the learning curve. For example, on noisy data, windowing learning curves are notoriou... |

220 |
Depth-first iterative deepening: An optimal admissible tree search
- Korf
- 1985
(Show Context)
Citation Context ...rithm on subsets of size 2 We follow the reasoning of Korf, who shows that progressive deepening is an optimal schedule for conducting depth-first search when the smallest sufficient depth is unknown =-=[9]-=- [15]. a i \Delta n 0 for i = 0; 1; : : : ; b, where b + 1 is the number of samples processed before detecting convergence. Now, we assume convergence is well detected, so a b\Gamma1 \Delta n 0 ! nmin... |

170 | Incremental induction of decision trees
- Utgoff
- 1989
(Show Context)
Citation Context ...5 run time is often claimed to be linear in the number of instances, for non-numeric data sets. This claim is based on an analysis by Utgoff, where he shows the asymptotic time complexity to be O(n) (=-=Utgoff, 1989-=-). With numeric data, sorting adds a log n term at each node. However, C4.5 has been observed to be worse than O(n 2 ) (Catlett, 1991a). One explanation for the discrepancy is that Utgoff actually sho... |

93 | A survey of methods for scaling up inductive algorithms, Data mining and knowledge discovery
- Provost, Kolluri
- 1999
(Show Context)
Citation Context ...can overshoot nmin greatly, they then calculate n ffi for an efficient arithmetic schedule, and revise the estimate after executing each schedule point. Other sequential multi-sample learning methods =-=[14]-=- are degenerate instances of progressive sampling, typically using fixed arithmetic schedules and treating convergence detection simplistically, if at all. For this paper, we have considered only draw... |

91 |
Megainduction: Machine learning on very large databases
- Catlett
- 1991
(Show Context)
Citation Context ...ves typically have a steeply sloping portion early in the curve, a more gently sloping middle portion, and a plateau late in the curve. The middle portion can be extremely large in some curves (e.g., =-=[2, 3, 6]-=-) and almost entirely missing in others. The plateau occurs when adding additional data instances does not improve accuracy. The plateau, and even the entire middle portion, can be missing from curves... |

73 | Static versus dynamic sampling for data mining
- John, Langley
- 1996
(Show Context)
Citation Context ...use for comparison the schedule composed of a single data set with all instances, SN = fNg. We also will consider the simple schedule generated by an omniscient oracle, SO = fnmin g. John and Langley =-=[8]-=- define a progressive sampling approach we call arithmetic sampling using the schedule S a = n 0 + (i \Delta n ffi ) = fn 0 ; n 0 + n ffi ; n 0 + 2n ffi ; : : : ; n 0 + k \Delta n ffi g. 1 An example ... |

69 | The Effects of Training Set Size on Decision Tree Complexity
- Oates
- 1997
(Show Context)
Citation Context ...on small samples. However, empirical studies of the application of standard induction algorithms to large data sets--- those of relevance to this paper---have shown learning curves to be well behaved =-=[3, 4, 6, 12, 13]-=-. In addition, practical progressive sampling demands only that learning curves are well behaved at the level Compute schedule S = fn 0 ; n 1 ; n 2 ; : : : ; n k g of sample sizes n / n 0 M / model in... |

62 |
Megainduction: a test flight
- Catlett
- 1991
(Show Context)
Citation Context ...ves typically have a steeply sloping portion early in the curve, a more gently sloping middle portion, and a plateau late in the curve. The middle portion can be extremely large in some curves (e.g., =-=[2, 3, 6]-=-) and almost entirely missing in others. The plateau occurs when adding additional data instances does not improve accuracy. The plateau, and even the entire middle portion, can be missing from curves... |

53 | Rigorous Learning Curve Bounds from Statistical Mechanics
- Haussler, Kearns, et al.
- 1994
(Show Context)
Citation Context .... Locality is defined within a particular progressive sampling procedure. Not all learning curves are well behaved. For example, theoretical analyses of learning curves based on statistical mechanics =-=[7, 19]-=- have shown that sudden increases in accuracy are possible, particularly on small samples. However, empirical studies of the application of standard induction algorithms to large data sets--- those of... |

46 | Large data sets lead to overly complex models: an explanation and a solution
- Oates, Jensen
- 1998
(Show Context)
Citation Context ...on small samples. However, empirical studies of the application of standard induction algorithms to large data sets--- those of relevance to this paper---have shown learning curves to be well behaved =-=[3, 4, 6, 12, 13]-=-. In addition, practical progressive sampling demands only that learning curves are well behaved at the level Compute schedule S = fn 0 ; n 1 ; n 2 ; : : : ; n k g of sample sizes n / n 0 M / model in... |

41 |
Decision theoretic subsampling for induction on large databases
- Musick, Catlett, et al.
- 1993
(Show Context)
Citation Context ...can be done on convergence detection. With very slow sampling, the efficiency of progressive sampling will be the same as if nmin were known a priori. 7 Other related work The method of Musick et al. =-=[11]-=- for determining the best attribute at each decision-tree node can be seen as an instance of the generic progressive sampling algorithm shown in figure 2, if we regard each node of the decision tree a... |

35 |
The statistical mechanics of learning a rule
- Watkin, Rau, et al.
- 1993
(Show Context)
Citation Context .... Locality is defined within a particular progressive sampling procedure. Not all learning curves are well behaved. For example, theoretical analyses of learning curves based on statistical mechanics =-=[7, 19]-=- have shown that sudden increases in accuracy are possible, particularly on small samples. However, empirical studies of the application of standard induction algorithms to large data sets--- those of... |

19 |
Modeling decision tree performance with the power law
- Frey, Fisher
- 1999
(Show Context)
Citation Context ...on small samples. However, empirical studies of the application of standard induction algorithms to large data sets--- those of relevance to this paper---have shown learning curves to be well behaved =-=[3, 4, 6, 12, 13]-=-. In addition, practical progressive sampling demands only that learning curves are well behaved at the level Compute schedule S = fn 0 ; n 1 ; n 2 ; : : : ; n k g of sample sizes n / n 0 M / model in... |

15 | Incremental induction of decision trees - Utgo - 1989 |

11 |
Sample size and misclassification: Is more always better? Working Paper AMSCAT-WP-97-118, AMS Center for Advanced Technologies
- Harris-Jones, Haines
- 1997
(Show Context)
Citation Context ...ves typically have a steeply sloping portion early in the curve, a more gently sloping middle portion, and a plateau late in the curve. The middle portion can be extremely large in some curves (e.g., =-=[2, 3, 6]-=-) and almost entirely missing in others. The plateau occurs when adding additional data instances does not improve accuracy. The plateau, and even the entire middle portion, can be missing from curves... |

10 |
Iterative weakening: Optimal and nearoptimal policies for the selection of search bias
- Provost
- 1993
(Show Context)
Citation Context ...m on subsets of size 2 We follow the reasoning of Korf, who shows that progressive deepening is an optimal schedule for conducting depth-first search when the smallest sufficient depth is unknown [9] =-=[15]-=-. a i \Delta n 0 for i = 0; 1; : : : ; b, where b + 1 is the number of samples processed before detecting convergence. Now, we assume convergence is well detected, so a b\Gamma1 \Delta n 0 ! nminsa b ... |

6 |
Megainduction: A test ight
- Catlett
- 1991
(Show Context)
Citation Context ...ve a steeply sloping portion early in the curve, a more gently sloping middle portion, and a plateau late in the curve. The gently sloping middle portion can be extremely large in some curves (e.g., (=-=Catlett, 1991-=-a, 1991b� Harris-Jones & Haines, 1997)) and almost entirely missing in others. The plateau occurs when adding additional data instances does not improve accuracy. The plateau, and even the entire midd... |

6 |
Depth- rst iterative deepening: An optimal admissible tree search
- Korf
- 1985
(Show Context)
Citation Context ...ta sets. We discuss further 2We follow the reasoning of Korf, who shows that progressive deepening is an optimal schedule for conducting depth- rst search when the smallest su cient depth is unknown (=-=Korf, 1985-=-) (Provost, 1993). 6reasons in Section 8, and provide empirical results testing this proposition in Section 5. First we explore further the topic of optimal schedules. 3.3 Computing optimal schedules... |

2 |
SPSS for Windows
- J
- 1993
(Show Context)
Citation Context ...urve. For example, on noisy data, windowing learning curves are notoriously ill behaved: subsequent samples contain increasing amounts of noise, and performance often decreases as sampling progresses =-=[5]-=-. It would be interesting to examine more closely the use of the techniques outlined above in the context of active sampling, and the potential synergies. 8 Conclusion With this work we have made subs... |

2 |
Sample size and misclassi cation: Is more always better?. Working paper AMSCAT-WP-97-118, AMS Center for Advanced Technologies
- Harris-Jones, Haines
- 1997
(Show Context)
Citation Context ...rtion early in the curve, a more gently sloping middle portion, and a plateau late in the curve. The gently sloping middle portion can be extremely large in some curves (e.g., (Catlett, 1991a, 1991b� =-=Harris-Jones & Haines, 1997-=-)) and almost entirely missing in others. The plateau occurs when adding additional data instances does not improve accuracy. The plateau, and even the entire middle portion, can be missing from curve... |

1 | Asurvey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery - Provost, Kolluri - 1999 |

1 | Iterative weakening: Optimal and near-optimal policies for the selection of search bias - J - 1993 |