Results 1  10
of
10
Efficient Progressive Sampling
, 1999
"... Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with ..."
Abstract

Cited by 113 (10 self)
 Add to MetaCart
(Show Context)
Having access to massiveamounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size is rarely obvious. We analyze methods for progressive samplingstarting with small samples and progressively increasing them as long as model accuracy improves. We show that a simple, geometric sampling schedule is efficient in an asymptotic sense. We then explore the notion of optimal efficiency: what is the absolute best sampling schedule? We describe the issues involved in instantiating an "optimally efficient" progressive sampler. Finally,we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling often is preferable to analyzing all data instances.
Inductive Policy: The Pragmatics of Bias Selection
 MACHINE LEARNING
, 1995
"... This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing "blas selection " systems, examining the similarities and differences i ..."
Abstract

Cited by 47 (12 self)
 Add to MetaCart
(Show Context)
This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing &quot;blas selection &quot; systems, examining the similarities and differences in their inductive policies, and idemify three techniques useful for building inductive policies. We then present a framework for representing and automaticaIly selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost oferrors. The experiments show that a common framework can be used to implement policies for a variety of different types of blas selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.
Beamstack search: Integrating backtracking with beam search
 In International Conference on Automated Planning and Scheduling (ICAPS
, 2005
"... We describe a method for transforming beam search into a complete search algorithm that is guaranteed to find an optimal solution. Called beamstack search, the algorithm uses a new data structure, called a beam stack, that makes it possible to integrate systematic backtracking with beam search. The ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
(Show Context)
We describe a method for transforming beam search into a complete search algorithm that is guaranteed to find an optimal solution. Called beamstack search, the algorithm uses a new data structure, called a beam stack, that makes it possible to integrate systematic backtracking with beam search. The resulting search algorithm is an anytime algorithm that finds a good, suboptimal solution quickly, like beam search, and then backtracks and continues to find improved solutions until convergence to an optimal solution. We describe a memoryefficient implementation of beamstack search, called divideandconquer beamstack search, as well as an iterativedeepening version of the algorithm. The approach is applied to domainindependent STRIPS planning, and computational results show its advantages.
A Survey of Methods for Scaling Up Inductive Learning Algorithms
, 1997
"... Each year, one of the explicit challenges for the KDD research community is to develop methods that facilitate the use of inductive learning algorithms for mining very large databases. By collecting, categorizing, and summarizing past work on scaling up inductive learning algorithms, this paper serv ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Each year, one of the explicit challenges for the KDD research community is to develop methods that facilitate the use of inductive learning algorithms for mining very large databases. By collecting, categorizing, and summarizing past work on scaling up inductive learning algorithms, this paper serves to establish a common ground for researchers addressing the challenge. We begin with a discussion of important, but often tacit, issues related to scaling up learning algorithms. We highlight similarities among methods by categorizing them into three main approaches. For each approach, we then describe, compare, and contrast the different constituent methods, drawing on specific examples from the published literature. Finally, we use the preceding analysis to suggest how one should proceed when dealing with a large problem, and where future research efforts should be focused.
Complete anytime beam search
 In Proc. AAAI 1998
, 1998
"... Beam search executes a search method, such as bestfirst search or depthfirst search, but may abandon nonpromising search avenues in order to reduce complexity. Although it has existed for more than two decades and has been applied to many realworld problems, beam search still suffers from the dra ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Beam search executes a search method, such as bestfirst search or depthfirst search, but may abandon nonpromising search avenues in order to reduce complexity. Although it has existed for more than two decades and has been applied to many realworld problems, beam search still suffers from the drawback of possible termination with no solution or a solution of unsatisfactory quality. In this paper, we first propose a domainindependent heuristic for node pruning, and a method to reduce the possibility that beam search will fail. We then develop a complete beam search algorithm. The new algorithm can not only find an optimal solution, but can also reach better solutions sooner than its underlying search method. We apply complete beam search to the maximum boolean satisfiability and the symmetric and asymmetric Traveling Salesman Problems. Our experimental results show that the domainindependent pruning heuristic is effective and the new algorithm significantly improves the performance of its underlying search algorithm. 1
Temporal Preference Optimization as Weighted Constraint Satisfaction
"... We present a new efficient algorithm for obtaining utilitarian optimal solutions to Disjunctive Temporal Problems with Preferences (DTPPs). The previous stateoftheart system achieves temporal preference optimization using a SAT formulation, with its creators attributing its performance to advance ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We present a new efficient algorithm for obtaining utilitarian optimal solutions to Disjunctive Temporal Problems with Preferences (DTPPs). The previous stateoftheart system achieves temporal preference optimization using a SAT formulation, with its creators attributing its performance to advances in SAT solving techniques. We depart from the SAT encoding and instead introduce the Valued DTP (VDTP). In contrast to the traditional semiringbased formalism that annotates legal tuples of a constraint with preferences, our framework instead assigns elementary costs to the constraints themselves. After proving that the VDTP can express the same set of utilitarian optimal solutions as the DTPP with piecewiseconstant preference functions, we develop a method for achieving weighted constraint satisfaction within a metaCSP search space that has traditionally been used to solve DTPs without preferences. This allows us to directly incorporate several powerful techniques developed in previous decisionbased DTP literature. Finally, we present empirical results demonstrating that an implementation of our approach consistently outperforms the SATbased solver by orders of magnitude.
Iterative StateSpace Reduction for Flexible Computation
, 2000
"... Flexible computation is a general framework for decision making under limited computational resources. It enables an agent to allocate limited computational resources to maximize its overall performance or utility. In this paper, we present a strategy for flexible computation, which we call itera ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Flexible computation is a general framework for decision making under limited computational resources. It enables an agent to allocate limited computational resources to maximize its overall performance or utility. In this paper, we present a strategy for flexible computation, which we call iterative statespace reduction. The main ideas are to reduce a problem space that is difficult to search to one that is relatively easy to explore, to use the optimal solution from the reduced space as an approximate solution to the original problem, and to iteratively apply multiple reductions to progressively find better solutions.
Abstract Efficient Progressive Sampling
"... Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive samplingusing progressi ..."
Abstract
 Add to MetaCart
Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive samplingusing progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient. 1
unknown title
"... Abstract Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive sampling using ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Having access to massive amounts of data does not necessarily imply that induction algorithms must use them all. Samples often provide the same accuracy with far less computational cost. However, the correct sample size rarely is obvious. We analyze methods for progressive sampling using progressively larger samples as long as model accuracy improves. We explore several notions of efficient progressive sampling. We analyze efficiency relative to induction with all instances; we show that a simple, geometric sampling schedule is asymptotically optimal, and we describe how best to take into account prior expectations of accuracy convergence. We then describe the issues involved in instantiating an efficient progressive sampler, including how to detect convergence. Finally, we provide empirical results comparing a variety of progressive sampling methods. We conclude that progressive sampling can be remarkably efficient. 1 Introduction Induction algorithms face competing requirements for accuracy and efficiency. The requirement for accurate models often demands the use of large data sets that allow algorithms to discover complex structure and make accurate parameter estimates. The requirement for efficient induction demands the use of small data sets, because the computational complexity of even the most efficient induction algorithms is linear in the number of instances, and most algorithms are considerably less efficient. In this paper we study progressive sampling methods, which attempt to maximize accuracy as efficiently as possible. Progressive sampling starts with a small sample and uses progressively larger ones until model accuracy no longer improves. A central component of progressive sampling is a sampling schedule S =