#### DMCA

## Algorithm Selection for Search: A survey Algorithm Selection for Combinatorial Search Problems: A survey

### Citations

3647 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ...rge body of work that is relevant to Algorithm Selection in the Machine Learning literature. Smith-Miles (2008a) presents a survey of many approaches. Repeating this here is unnecessary and outside the scope of this paper, which focuses on the application of such techniques. The most relevant area of research is that into ensembles, where several models are created instead of one. Such ensembles are either implicitly assumed or explicitly engineered so that they complement each other. Errors made by one model are corrected by another. Ensembles can be engineered by techniques such as bagging (Breiman, 1996) and boosting (Schapire, 1990). Bauer and Kohavi (1999), Opitz and Maclin (1999) present studies that compare bagging and boosting empirically. Dietterich (2000) provides explanations for why ensembles can perform better than individual algorithms. There is increasing interest in the integration of Algorithm Selection techniques with programming language paradigms (e.g. Ansel, Chan, Wong, Olszewski, Zhao, Edelman, & Amarasinghe, 2009; Hoos, 2012). While these issues are sufficiently relevant to be mentioned here, exploring them in detail is outside the scope of the paper. Similarly, technical ... |

707 | An empirical comparison of voting classification algorithms: bagging, boosting and variants
- Bauer, Kohavi
- 1999
(Show Context)
Citation Context ... Selection in the Machine Learning literature. Smith-Miles (2008a) presents a survey of many approaches. Repeating this here is unnecessary and outside the scope of this paper, which focuses on the application of such techniques. The most relevant area of research is that into ensembles, where several models are created instead of one. Such ensembles are either implicitly assumed or explicitly engineered so that they complement each other. Errors made by one model are corrected by another. Ensembles can be engineered by techniques such as bagging (Breiman, 1996) and boosting (Schapire, 1990). Bauer and Kohavi (1999), Opitz and Maclin (1999) present studies that compare bagging and boosting empirically. Dietterich (2000) provides explanations for why ensembles can perform better than individual algorithms. There is increasing interest in the integration of Algorithm Selection techniques with programming language paradigms (e.g. Ansel, Chan, Wong, Olszewski, Zhao, Edelman, & Amarasinghe, 2009; Hoos, 2012). While these issues are sufficiently relevant to be mentioned here, exploring them in detail is outside the scope of the paper. Similarly, technical issues arising from the computation, storage and applic... |

680 | Where the really hard problems are - Cheeseman, Kanefsky, et al. - 1991 |

624 | Ensemble methods in machine learning
- Dietterich
(Show Context)
Citation Context ...ng this here is unnecessary and outside the scope of this paper, which focuses on the application of such techniques. The most relevant area of research is that into ensembles, where several models are created instead of one. Such ensembles are either implicitly assumed or explicitly engineered so that they complement each other. Errors made by one model are corrected by another. Ensembles can be engineered by techniques such as bagging (Breiman, 1996) and boosting (Schapire, 1990). Bauer and Kohavi (1999), Opitz and Maclin (1999) present studies that compare bagging and boosting empirically. Dietterich (2000) provides explanations for why ensembles can perform better than individual algorithms. There is increasing interest in the integration of Algorithm Selection techniques with programming language paradigms (e.g. Ansel, Chan, Wong, Olszewski, Zhao, Edelman, & Amarasinghe, 2009; Hoos, 2012). While these issues are sufficiently relevant to be mentioned here, exploring them in detail is outside the scope of the paper. Similarly, technical issues arising from the computation, storage and application of performance models, the integration of Algorithm Selection techniques into complex systems, the e... |

162 | PRODIGY: an integrated architecture for planning and learning.
- Carbonell, Etzioni, et al.
- 1991
(Show Context)
Citation Context ... cannot be changed easily. Closely related is the work by Lagoudakis and Littman (2000, 2001), which partitions the search space into recursive subtrees and selects the best algorithm from the portfolio for every subtree. They specifically consider recursive algorithms. At each recursive call, the Algorithm Selection procedure is invoked. This is a more natural extension of offline systems than monitoring the execution of the selected algorithms, as the same mechanisms can be used. Samulowitz and Memisevic (2007) also select algorithms for recursively solving sub-problems. The PRODIGY system (Carbonell, Etzioni, Gil, Joseph, Knoblock, Minton, & Veloso, 1991) selects the next operator to apply in order to reach the goal state of a planning problem at each node in the search tree. Similarly, Langley (1983a) learn weights for operators that can be applied at each search state and select from among them accordingly. Most approaches rely on an offline element that makes a decision before search starts. In the case of recursive calls, this is no different from making a decision during search however. Gagliolo et al. (2004), Gagliolo and Schmidhuber (2005, 2006b) on the other hand learn the Algorithm Selection model only dynamically while the problem i... |

160 |
Algorithm portfolios
- Gomes, Selman
- 2001
(Show Context)
Citation Context ... (2004), Yu and Rauchwerger (2006) and simulation (Wang & Tropper, 2007; Ewald et al., 2010). It should be noted that a significant part of Machine Learning research is concerned with developing Algorithm Selection techniques; the publications listed in this paragraph are the most relevant that use the specific techniques and framework surveyed here. Some publications consider more than one application domain. Stern et al. (2010) choose the best algorithm for Quantified Boolean Formulae and combinatorial auctions. Allen and Minton (1996), Kroer and Malitsky (2011) look at SAT and constraints. Gomes and Selman (2001) consider SAT and Mixed Integer Programming. In addition to these two domains, Kadioglu et al. (2010) also investigate set covering problems. Streeter and Smith (2008) apply their approach to SAT, Integer Programming and planning. Gagliolo and Schmidhuber (2011), Kotthoff et al. (2012), Kotthoff (2012a) compare the performance across Algorithm Selection problems from constraints, Quantified Boolean Formulae and SAT. In most cases, researchers take some steps to adapt their approaches to the application domain. This is usually done by using domain-specific features, such as the number of constr... |

150 | ParamILS: an automatic algorithm configuration framework. - Hutter, Hoos, et al. - 2009 |

139 |
An economics approach to hard computational problems,”
- Huberman, Lukose, et al.
- 1997
(Show Context)
Citation Context ... parameters. In Rice’s notation, the algorithm space A is constant, finite 8 Algorithm Selection for Search: A survey and known. This approach is used for example in SATzilla (Nudelman et al., 2004; Xu, Hutter, Hoos, & Leyton-Brown, 2007; Xu et al., 2008), AQME (Pulina & Tacchella, 2007, 2009), CPhydra (O’Mahony, Hebrard, Holland, Nugent, & O’Sullivan, 2008), ArgoSmArT (Nikolic, Maric, & Janicic, 2009) and BUS (Howe, Dahlman, Hansen, Scheetz, & von Mayrhauser, 1999). The vast majority of approaches composes static portfolios from different algorithms or different algorithm configurations. Huberman et al. (1997) however use a portfolio that contains the same randomised algorithm twice. They run the portfolio in parallel and as such essentially use the technique to parallelise an existing sequential algorithm. Some approaches use a large number of algorithms in the portfolio, such as ArgoSmArT, whose portfolio size is 60. SATzilla uses 19 algorithms, although the authors use portfolios containing only subsets of those for specific applications. BUS uses six algorithms and CPhydra five. Gent, Jefferson, Kotthoff, Miguel, Moore, Nightingale, and Petrie (2010a) select from a portfolio of only two algorit... |

118 | Generalizing from case studies: A case study.
- Aha
- 1992
(Show Context)
Citation Context ...inatorial search problems, the application of Algorithm Selection techniques has resulted in significant performance improvements that leverage the diversity of systems and techniques developed in recent years. This paper surveys the available literature and describes how research has progressed. Researchers have long ago recognised that a single algorithm will not give the best performance across all problems one may want to solve and that selecting the most appropriate method is likely to improve the overall performance. Empirical evaluations have provided compelling evidence for this (e.g. Aha, 1992; Wolpert & Macready, 1997). The original description of the Algorithm Selection Problem was published in Rice (1976). The basic model described in the paper is very simple – given a space of problems and a space of algorithms, map each problem-algorithm pair to its performance. This mapping can then be used to select the best algorithm for a given problem. The original figure that illustrates the model is reproduced in Figure 1 on the following page. As Rice states, “The objective is to determine S(x) [the mapping of problems to algorithms] so as to have high algorithm performance.” He identi... |

104 |
Groups of diverse problem solvers can outperform groups of highability problem solvers.
- Hong, Page
- 2004
(Show Context)
Citation Context ...hat the one with eight algorithms offers the best performance, as it has more variety than the portfolio with two algorithms and it is easier to make a choice for eight than for 16 algorithms. There are also approaches that use portfolios of variable size that is determined by training data (Kadioglu, Malitsky, Sellmann, & Tierney, 2010; Xu, Hoos, & Leyton-Brown, 2010). As the algorithms in the portfolio do not change, their selection is crucial for its success. Ideally, the algorithms will complement each other such that good performance can be achieved on a wide range of different problems. Hong and Page (2004) report that portfolios composed of a random selection from a large pool of diverse algorithms outperform portfolios composed of the algorithms with the best overall performance. They develop a framework with a mathematical model that theoretically justifies this observation. Samulowitz and Memisevic (2007) use a portfolio of heuristics for solving quantified Boolean formulae problems that have specifically been crafted to be orthogonal to each other. Xu et al. (2010) automatically engineer a portfolio with algorithms of complementary strengths. In Xu, Hutter, Hoos, and Leyton-Brown (2012), th... |

103 | Automatically configuring constraint satisfaction programs: A case study. - Minton, S - 1996 |

93 | Self adapting linear algebra algorithms and software
- Demmel, Dongarra, et al.
(Show Context)
Citation Context ...ch node of a search tree, or when the system judges it to be necessary to make a decision. Rice’s model assumes that only a single algorithm A ∈ A is selected. It implicitly assumes that this selection occurs only once and before solving the actual problem. 3.1 What to select A common and the simplest approach is to select a single algorithm from the portfolio and use it to solve the problem completely. This single algorithm has been determined to be the best for the problem at hand. For example SATzilla (Nudelman et al., 2004; Xu et al., 2007, 2008), ArgoSmArT (Nikolic et al., 2009), SALSA (Demmel, Dongarra, Eijkhout, Fuentes, Petitet, Vuduc, Whaley, & Yelick, 2005) and Eureka (Cook & Varnell, 1997) do this. The disadvantage of this approach is 11 Kotthoff that there is no way of mitigating a wrong selection. If an algorithm is chosen that exhibits bad performance on the problem, the system is “stuck” with it and no adjustments are made, even if all other portfolio algorithms would perform much better. An alternative approach is to compute schedules for running (a subset of) the algorithms in the portfolio. In some approaches, the terms portfolio and schedule are used synonymously – all algorithms in the portfolio are selected and run according to a sch... |

81 | Petabricks: A language and compiler for algorithmic choice
- Ansel, Chan, et al.
- 2009
(Show Context)
Citation Context ...h ensembles are either implicitly assumed or explicitly engineered so that they complement each other. Errors made by one model are corrected by another. Ensembles can be engineered by techniques such as bagging (Breiman, 1996) and boosting (Schapire, 1990). Bauer and Kohavi (1999), Opitz and Maclin (1999) present studies that compare bagging and boosting empirically. Dietterich (2000) provides explanations for why ensembles can perform better than individual algorithms. There is increasing interest in the integration of Algorithm Selection techniques with programming language paradigms (e.g. Ansel, Chan, Wong, Olszewski, Zhao, Edelman, & Amarasinghe, 2009; Hoos, 2012). While these issues are sufficiently relevant to be mentioned here, exploring them in detail is outside the scope of the paper. Similarly, technical issues arising from the computation, storage and application of performance models, the integration of Algorithm Selection techniques into complex systems, the execution of choices and the collection of experimental data to facilitate Algorithm Selection are not surveyed here. 1.3 Terminology Algorithm Selection is a widely applicable concept and as such has cropped up frequently in various lines of research. Often, different termino... |

79 | Phase transitions and the search problem - Hogg, Huberman, et al. - 1996 |

78 | Learning the empirical hardness of optimization problems: The case of combinatorial auctions.
- Leyton-Brown, Nudelman, et al.
- 2002
(Show Context)
Citation Context ...self is wildly inaccurate as long as it is correct relative to the other predictions. This is the approach that is implicitly assumed in Rice’s framework. The prediction is the performance mapping P (A, x) for an algorithm A ∈ A on a problem x ∈ P. Models for each algorithm in the portfolio are used for example by Xu et al. (2008), Howe et al. (1999), Allen and Minton (1996), Lobjois and Lemaıtre (1998), Gagliolo and Schmidhuber (2006b). A common way of doing this is to use regression to directly predict the performance of each algorithm. This is used by Xu et al. (2008), Howe et al. (1999), Leyton-Brown et al. (2002), Haim and Walsh (2009), Roberts and Howe (2007). The performance of the algorithms in the portfolio is evaluated on a set of training problems, and a relationship between the characteristics of a problem and the performance of an algorithm derived. This relationship usually has the form of a simple formula that is cheap to compute at runtime. Silverthorn and Miikkulainen (2010) on the other hand learn latent class models of unobserved variables to capture relationships between solvers, problems and run durations. Based on the predictions, the expected utility is computed and used to select an... |

77 |
High-level optimization via automated statistical modeling.
- Brewer
- 1995
(Show Context)
Citation Context ... the portfolio algorithms. Kotthoff et al. (2012) use statistical relational learning to directly predict the ranking instead of deriving it from other predictions. Howe et al. (1999), Gagliolo et al. (2004), Gagliolo and Schmidhuber (2006b), Roberts 20 Algorithm Selection for Search: A survey and Howe (2006), O’Mahony et al. (2008) predict resource allocations for the algorithms in the portfolios. Gebruers et al. (2005), Little, Gebruers, Bridge, and Freuder (2002), Borrett and Tsang (2001) consider selecting the most appropriate formulation of a constraint problem. Smith and Setliff (1992), Brewer (1995), Wilson et al. (2000), Balasubramaniam et al. (2012) select algorithms and data structures to be used in a software system. Some types of predictions require online approaches that make decisions during search. Borrett et al. (1996), Sakkout et al. (1996), Carchrae and Beck (2004), Armstrong et al. (2006) predict when to switch the algorithm used to solve a problem. Horvitz et al. (2001) predict whether to restart an algorithm. Lagoudakis and Littman (2000, 2001) predict the cost to solve a sub-problem. However, most online approaches make predictions that can also be used in offline settings... |

76 |
COMPOSER: a probabilistic solution to the utility problem in Speed-Up learning.
- Gratch, DeJong
- 1992
(Show Context)
Citation Context ... performance are hard to 16 Algorithm Selection for Search: A survey understand. This makes it hard to assess whether a learned abstract model is appropriate and what its requirements and limitations are. Explicitly-learned models try to identify the concepts that affect performance for a given problem. This acquired knowledge can be made explicit to improve the understanding of the researchers of the problem domain. There are several Machine Learning techniques that facilitate this, as the learned models are represented in a form that is easy to understand by humans. Carbonell et al. (1991), Gratch and DeJong (1992), Brodley (1993), Vrakas et al. (2003) learn classification rules that guide the selector. Vrakas et al. (2003) note that the decision to use a classification rule leaner was not so much guided by the performance of the approach, but the easy interpretability of the result. Langley (1983a), Epstein et al. (2002), Nareyek (2001) learn weights for decision rules to guide the selector towards the best algorithms. Cook and Varnell (1997), Guerri and Milano (2004), Guo and Hsu (2004), Roberts and Howe (2006), Bhowmick, Eijkhout, Freund, Fuentes, and Keyes (2006), Gent et al. (2010a) go one step fur... |

76 | Automatic algorithm configuration based on local search. - Hutter, Hoos, et al. - 2007 |

75 | A bayesian approach to tackling hard computational problems.
- Horvitz, Ruan, et al.
- 2001
(Show Context)
Citation Context ...lgorithms in parallel. Related research is concerned with the scheduling of restarts of stochastic algorithms – it also investigates the best way of allocating resources. The paper that introduced algorithm portfolios (Huberman et al., 1997) uses a portfolio of identical stochastic algorithms that are run with different random seeds. There is a large amount of research on how to determine restart schedules for randomised algorithms and a survey of this is outside the scope of this paper. A few approaches that are particularly relevant to Algorithm Selection and portfolios are mentioned below. Horvitz et al. (2001) determine the amount of time to allocate to a stochastic algorithm before restarting it. They use dynamic policies that take performance predictions into account, showing that it can outperform an optimal fixed policy. Cicirello and Smith (2005) investigate a restart model model that allocates resources to an algorithm proportional to the number of times it has been successful in the past. In particular, they note that the allocated resources should grow doubly exponentially in the number of successes. Allocation of fewer resources results in over-exploration (too many different things are tr... |

74 | Addressing the selective superiority problem: Automatic Algorithm/Model class selection.
- Brodley
- 1993
(Show Context)
Citation Context ...m hybrid algorithm for the combination of a set of algorithms and an Algorithm Selection model (which they term selector). 6 Algorithm Selection for Search: A survey In Machine Learning, Algorithm Selection is usually referred to as meta-learning. This is because Algorithm Selection models for Machine Learning learn when to use which method of Machine Learning. The earliest approaches also spoke of hybrid approaches (e.g. Utgoff, 1988). Aha (1992) proposes rules for selecting a Machine Learning algorithm that take the characteristics of a data set into account. He uses the term meta-learning. Brodley (1993) introduces the notion of selective superiority. This concept refers to a particular algorithm being best on some, but not all tasks. In addition to the many terms used for the process of Algorithm Selection, researchers have also used different terminology for the models of what Rice calls performance measure space. Allen and Minton (1996) call them runtime performance predictors. Leyton-Brown, Nudelman, and Shoham (2002), Hutter, Hamadi, Hoos, and Leyton-Brown (2006), Xu, Hoos, and Leyton-Brown (2007), Leyton-Brown, Nudelman, and Shoham (2009) coined the term Empirical Hardness model. This s... |

65 | Algorithm selection using reinforcement learning.
- Lagoudakis, Littman
- 2000
(Show Context)
Citation Context ...hes that make decisions during search, for example at every node of the search tree, are necessarily online systems. Arbelaez, Hamadi, and Sebag (2009) select the best search strategy at checkpoints in the search tree. Similarly, Brodley (1993) recursively partitions the classification problem to be solved and selects an algorithm for each partition. In this approach, a lower-level decision can lead to changing the decision at the level above. This is usually not possible for combinatorial search problems, as decisions at a higher level cannot be changed easily. Closely related is the work by Lagoudakis and Littman (2000, 2001), which partitions the search space into recursive subtrees and selects the best algorithm from the portfolio for every subtree. They specifically consider recursive algorithms. At each recursive call, the Algorithm Selection procedure is invoked. This is a more natural extension of offline systems than monitoring the execution of the selected algorithms, as the same mechanisms can be used. Samulowitz and Memisevic (2007) also select algorithms for recursively solving sub-problems. The PRODIGY system (Carbonell, Etzioni, Gil, Joseph, Knoblock, Minton, & Veloso, 1991) selects the next op... |

64 |
Algorithm portfolio design: Theory vs. practice.
- Gomes, Sehnan
- 1997
(Show Context)
Citation Context ...in most cases, they can be measured directly. In Machine Learning, ensembles (Dietterich, 2000) are instances of algorithm portfolios. In fact, the only difference between algorithm portfolios and Machine Learning ensembles is the way in which its constituents are used. The idea of algorithm portfolios was first presented by Huberman, Lukose, and Hogg (1997). They describe a formal framework for the construction and application of algorithm portfolios and evaluate their approach on graph colouring problems. Within the Artificial Intelligence community, algorithm portfolios were popularised by Gomes and Selman (1997a, 1997b) and a subsequent extended investigation (Gomes & Selman, 2001). The technique itself however had been described under different names by other authors at about the same time in different contexts. Tsang et al. (1995) experimentally show for a selection of constraint satisfaction algorithms and heuristics that none is the best on all evaluated problems. They do not mention portfolios, but propose that future research should focus on identifying when particular algorithms and heuristics deliver the best performance. This implicitly assumes a portfolio to choose algorithms from. Allen a... |

62 | A Gender-Based genetic algorithm for the automatic configuration of algorithms. - Ansotegui, Sellmann, et al. - 2009 |

61 | Performance prediction and automated tuning of randomized and parametric algorithms.
- Hutter, Hamadi, et al.
- 2006
(Show Context)
Citation Context ...diction. Weerawarana et al. (1996) propose an approach that uses neural networks in addition to the Bayesian belief propagation approach they describe initially. Cook and Varnell (1997) compare different decision tree learners, a Bayesian classifier, a nearest neighbour approach and a neural network. They chose the C4.5 decision tree inducer because even though it may be outperformed by a neural network, the learned trees are easily understandable by humans and may provide insight into the problem domain. Leyton-Brown et al. (2002) compare several versions of linear and non-linear regression. Hutter et al. (2006) report having explored support vector machine regression, multivariate adaptive regression splines (MARS) and lasso regression before deciding to use the linear regression approach of Leyton-Brown et al. (2002). They also report experimental results with sequential Bayesian linear regression and Gaussian Process regression. Guo (2003), Guo and Hsu (2004) explore using decision trees, naıve Bayes rules, Bayesian networks and meta-learning techniques. They also chose the C4.5 decision tree inducer because it is one of the top performers and creates models that are easy to understand and quick ... |

56 | Understanding random SAT: beyond the Clauses-to-Variables ratio. In
- Nudelman, Leyton-Brown, et al.
- 2004
(Show Context)
Citation Context ...portfolios are constructed offline before any problems are solved. While solving a problem, the composition of the portfolio and the algorithms within it do not change. Dynamic portfolios change in composition, configuration of the constituent algorithms or both during solving. 2.1 Static portfolios Static portfolios are the most common type. The number of algorithms or systems in the portfolio is fixed, as well as their parameters. In Rice’s notation, the algorithm space A is constant, finite 8 Algorithm Selection for Search: A survey and known. This approach is used for example in SATzilla (Nudelman et al., 2004; Xu, Hutter, Hoos, & Leyton-Brown, 2007; Xu et al., 2008), AQME (Pulina & Tacchella, 2007, 2009), CPhydra (O’Mahony, Hebrard, Holland, Nugent, & O’Sullivan, 2008), ArgoSmArT (Nikolic, Maric, & Janicic, 2009) and BUS (Howe, Dahlman, Hansen, Scheetz, & von Mayrhauser, 1999). The vast majority of approaches composes static portfolios from different algorithms or different algorithm configurations. Huberman et al. (1997) however use a portfolio that contains the same randomised algorithm twice. They run the portfolio in parallel and as such essentially use the technique to parallelise an exis... |

41 | Instance-Specific Algorithm Configuration.
- Malitsky
- 2012
(Show Context)
Citation Context ...010a) select from a portfolio of only two algorithms. AQME has different versions with different portfolio sizes, one with 16 algorithms, one with five and three algorithms of different types and one with two algorithms (Pulina & Tacchella, 2009). The authors compare the different portfolios and conclude that the one with eight algorithms offers the best performance, as it has more variety than the portfolio with two algorithms and it is easier to make a choice for eight than for 16 algorithms. There are also approaches that use portfolios of variable size that is determined by training data (Kadioglu, Malitsky, Sellmann, & Tierney, 2010; Xu, Hoos, & Leyton-Brown, 2010). As the algorithms in the portfolio do not change, their selection is crucial for its success. Ideally, the algorithms will complement each other such that good performance can be achieved on a wide range of different problems. Hong and Page (2004) report that portfolios composed of a random selection from a large pool of diverse algorithms outperform portfolios composed of the algorithms with the best overall performance. They develop a framework with a mathematical model that theoretically justifies this observation. Samulowitz and Memisevic (2007) use a por... |

41 | Choosing search heuristics by non-stationary reinforcement learning
- Nareyek
- 2003
(Show Context)
Citation Context ...e explicit to improve the understanding of the researchers of the problem domain. There are several Machine Learning techniques that facilitate this, as the learned models are represented in a form that is easy to understand by humans. Carbonell et al. (1991), Gratch and DeJong (1992), Brodley (1993), Vrakas et al. (2003) learn classification rules that guide the selector. Vrakas et al. (2003) note that the decision to use a classification rule leaner was not so much guided by the performance of the approach, but the easy interpretability of the result. Langley (1983a), Epstein et al. (2002), Nareyek (2001) learn weights for decision rules to guide the selector towards the best algorithms. Cook and Varnell (1997), Guerri and Milano (2004), Guo and Hsu (2004), Roberts and Howe (2006), Bhowmick, Eijkhout, Freund, Fuentes, and Keyes (2006), Gent et al. (2010a) go one step further and learn decision trees. Guo and Hsu (2004) again note that the reason for choosing decision trees was not primarily the performance, but the understandability of the result. Pfahringer, Bensusan, and Giraud-Carrier (2000) show the set of learned rules in the paper to illustrate its compactness. Similarly, Gent et al. (20... |

40 | The adaptive constraint engine.
- Epstein, Freuder, et al.
- 2002
(Show Context)
Citation Context ...rithm space A changes with each problem and is a subspace of the potentially infinite super algorithm space A′. This space contains all possible (hypothetical) algorithms that could be used to solve problems from the problem space. In static portfolios, the algorithms in the portfolio are 9 Kotthoff selected from A′ once either manually by the designer of the portfolio or automatically based on empirical results from training data. One approach is to build a portfolio by combining algorithmic building blocks. An example of this is the Adaptive Constraint Engine (ACE) (Epstein & Freuder, 2001; Epstein, Freuder, Wallace, Morozov, & Samuels, 2002). The building blocks are so-called advisors, which characterise variables of the constraint problem and give recommendations as to which one to process next. ACE combines these advisors into more complex ones. Elsayed and Michel (2010, 2011) use a similar idea to construct search strategies for solving constraint problems. Fukunaga (2002, 2008) proposes CLASS, which combines heuristic building blocks to form composite heuristics for solving SAT problems. In these approaches, there is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower l... |

37 | Applying machine learning to Low-Knowledge control of optimization algorithms. - Carchrae, Beck - 2005 |

37 | Branch and bound algorithm selection by performance prediction. - Lobjois, Lemaıtre - 1998 |

36 | Adaptive constraint satisfaction: The quickest first principle.
- Borrett, Tsang, et al.
- 1996
(Show Context)
Citation Context ... bad choices. Their advantage however is that they only need to select an algorithm once and incur no overhead while the problem is being solved. Moving towards online systems, the next step is to monitor the execution of an algorithm or a schedule to be able to intervene if expectations are not met. Fink (1997, 1998) investigates setting a time bound for the algorithm that has been selected based on the predicted performance. If the 13 Kotthoff time bound is exceeded, the solution attempt is abandoned. More sophisticated systems furthermore adjust their selection if such a bound is exceeded. Borrett et al. (1996) try to detect behaviour during search that indicates that the algorithm is performing badly, for example visiting nodes in a subtree of the search that clearly do not lead to a solution. If such behaviour is detected, they propose switching the currently running algorithm according to a fixed replacement list. Sakkout, Wallace, and Richards (1996) explore the same basic idea. They switch between two algorithms for solving constraint problems that achieve different levels of consistency. The level of consistency refers to the amount of search space that is ruled out by inference before actuall... |

35 | Learning techniques for automatic algorithm portfolio selection. - Guerri, A, et al. - 2004 |

34 | The max k-armed bandit: A new model of exploration applied to search heuristic selection.
- Cicirello, Smith
- 2005
(Show Context)
Citation Context ... uses a portfolio of identical stochastic algorithms that are run with different random seeds. There is a large amount of research on how to determine restart schedules for randomised algorithms and a survey of this is outside the scope of this paper. A few approaches that are particularly relevant to Algorithm Selection and portfolios are mentioned below. Horvitz et al. (2001) determine the amount of time to allocate to a stochastic algorithm before restarting it. They use dynamic policies that take performance predictions into account, showing that it can outperform an optimal fixed policy. Cicirello and Smith (2005) investigate a restart model model that allocates resources to an algorithm proportional to the number of times it has been successful in the past. In particular, they note that the allocated resources should grow doubly exponentially in the number of successes. Allocation of fewer resources results in over-exploration (too many different things are tried and not enough resources given to each) and allocation of more resources in over-exploitation (something is tried for to too long before moving on to something different). Streeter, Golovin, and Smith (2007b) compute restart schedules that ta... |

33 | How to solve it automatically: Selection among Problem-Solving methods.
- Fink
- 1998
(Show Context)
Citation Context ...t al. (2001) predict whether to restart an algorithm. Lagoudakis and Littman (2000, 2001) predict the cost to solve a sub-problem. However, most online approaches make predictions that can also be used in offline settings, such as the best algorithm to proceed with. The primary selection criteria and prediction for Soares et al. (2004) and Leite et al. (2010) is the quality of the solution an algorithm produces rather than the time it takes the algorithm to find that solution. In addition to the primary selection criteria, a number of approaches predict secondary criteria. Howe et al. (1999), Fink (1998), Roberts and Howe (2007) predict the probability of success for each algorithm. Weerawarana et al. (1996) predict the quality of a solution. In Rice’s model, the prediction of an Algorithm Selection system is the performance p ∈ Rn of an algorithm. This abstract notion does not rely on time and is applicable to many approaches. It does not fit techniques that predict the portfolio algorithm to choose or more complex measures such as a schedule however. As Rice developed his approach long before the advent of algorithm portfolios, it should not be surprising that the notion of the performance ... |

33 | Automated discovery of composite SAT variable-selection heuristics.
- Fukunaga
- 2002
(Show Context)
Citation Context ...omatically based on empirical results from training data. One approach is to build a portfolio by combining algorithmic building blocks. An example of this is the Adaptive Constraint Engine (ACE) (Epstein & Freuder, 2001; Epstein, Freuder, Wallace, Morozov, & Samuels, 2002). The building blocks are so-called advisors, which characterise variables of the constraint problem and give recommendations as to which one to process next. ACE combines these advisors into more complex ones. Elsayed and Michel (2010, 2011) use a similar idea to construct search strategies for solving constraint problems. Fukunaga (2002, 2008) proposes CLASS, which combines heuristic building blocks to form composite heuristics for solving SAT problems. In these approaches, there is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower level components. Closely related is the concept of specialising generic building blocks for the problem to solve. This approach is taken in the SAGE system (Strategy Acquisition Governed by Experimentation) (Langley, 1983b, 1983a). It starts with a set of general operators that can be applied to a search state. These operators are refined ... |

33 | Exploiting competitive planner performance. - Howe, Dahlman, et al. - 1999 |

32 | Algorithm selection and scheduling. - Kadioglu, Malitsky, et al. - 2011 |

31 | Automated Configuration of Algorithms for Solving Hard Computational Problems.
- Hutter
- 2009
(Show Context)
Citation Context ...s to apply when. Heuristics are not necessarily complete or deterministic, i.e. they are not guaranteed to find a solution if it exists or to always make the same decision under the same circumstances. The nature of heuristics makes them particularly amenable to Algorithm Selection – choosing a heuristic manually is difficult even for experts, but choosing the correct one can improve performance significantly. Several doctoral dissertations with related work chapters that survey the literature on Algorithm Selection have been produced. Examples of the more recent ones include Streeter (2007), Hutter (2009), Carchrae (2009), Gagliolo (2010), Ewald (2010), Kotthoff (2012b), Malitsky (2012). SmithMiles (2008a) presents a survey with similar aims. It looks at the Algorithm Selection Problem from the Machine Learning point of view and focuses on seeing Algorithm Selection as a learning problem. As a consequence, great detail is given for aspects that are relevant to Machine Learning. In this paper, we take a more practical point of view and focus on techniques that facilitate and implement Algorithm Selection systems. We are furthermore able to take more recent work in this fast-moving area into acc... |

29 |
Dynamic problem structure analysis as a basis for constraintdirected scheduling heuristics.
- Beck, Fox
- 2000
(Show Context)
Citation Context ...(Horvitz et al., 2001), the runtime environment (Armstrong et al., 2006), structures derived from the problem such as the primal graph of a constraint problem (Gebruers et al., 2004; Guerri & Milano, 2004; Gent et al., 2010a), specific parts of the problem model such as variables (Epstein & Freuder, 2001), the algorithms in the portfolio themselves (Hough & Williams, 2006) or the domain of the problem to be solved (Carbonell et al., 1991), Gerevini et al. (2009) rely on the problem domain as the only problem-specific feature and select based on past performance data for the particular domain. Beck and Fox (2000) consider not only the values of properties of a problem, but the changes of those values while the problem is being solved. Smith and Setliff (1992) consider features of abstract representations of the algorithms. Yu et al. (2004), Yu and Rauchwerger (2006) use features that represent technical details of the behaviour of an algorithm on a problem, such as the type of computations done in a loop. Most approaches use features that are applicable to all problems of the application domain they are considering. However, Horvitz et al. (2001) use features that are not only specific to their applic... |

29 | An automatically configurable portfolio-based planner with macro-actions: PbP.
- Gerevini, Saetti, et al.
- 2009
(Show Context)
Citation Context ...ll of their computed schedules contain all portfolio algorithms. Streeter, Golovin, and Smith (2007a) compute 12 Algorithm Selection for Search: A survey a schedule with the aim of improving the average-case performance. In later work, they compute theoretical guarantees for the performance of their schedule (Streeter & Smith, 2008). Wu and van Beek (2007) approach scheduling the chosen algorithms in a different way and assume a fixed limit on the amount of resources an algorithm can consume while solving a problem. All algorithms are run sequentially for this fixed amount of time. Similar to Gerevini et al. (2009), they simulate the performance of different allocations and select the best one based on the results of these simulations. (Fukunaga, 2000) estimates the performance of candidate allocations through bootstrap sampling. Gomes and Selman (1997a, 2001) also evaluate the performance of different candidate portfolios, but take into account how many algorithms can be run in parallel. They demonstrate that the optimal schedule (in this case the number of algorithms that are being run) changes as the number of available processors increases. Gagliolo and Schmidhuber (2008) investigate how to allocate... |

29 | Learning to select branching rules in the DPLL procedure for satisfiability.
- Lagoudakis, Littman
- 2001
(Show Context)
Citation Context ...thm Selection techniques have been applied. Over the years, Algorithm Selection systems have been used in many different application domains. These range from Mathematics, e.g. differential equations (Kamel, Enright, & Ma, 1993; Weerawarana et al., 1996), linear algebra (Demmel et al., 2005) and linear systems (Bhowmick et al., 2006; Kuefler & Chen, 2008), to the selection of algorithms and data structures in software design (Smith & Setliff, 1992; Cahill, 1994; Brewer, 1995; Wilson et al., 2000). A very common application domain are combinatorial search problems such as SAT (Xu et al., 2008; Lagoudakis & Littman, 2001; Silverthorn & Miikkulainen, 2010), constraints (Minton, 1996; Epstein et al., 2002; O’Mahony et al., 2008), Mixed Integer Programming (Xu, Hutter, Hoos, & Leyton-Brown, 2011), Quantified Boolean Formulae (Pulina & Tacchella, 2009; Stern et al., 2010), planning (Carbonell et al., 1991; Howe et al., 1999; Vrakas et al., 2003), scheduling (Beck & Fox, 2000; Beck & Freuder, 2004; Cicirello & Smith, 2005), combinatorial auctions (Leyton-Brown et al., 2002; Gebruers et al., 2004; Gagliolo & Schmidhuber, 2006b), Answer Set Programming (Gebser, Kaminski, Kaufmann, Schaub, Schneider, & Ziller, 2011),... |

26 | A comparison of ranking methods for classification algorithm selection.
- Brazdil, Soares
- 2000
(Show Context)
Citation Context ...ns. Weerawarana, Houstis, Rice, Joshi, and Houstis (1996) use Bayesian belief propagation to predict the runtime of a particular algorithm on a particular problem. Bayesian inference is used to determine the class of a problem and the closest case in the knowledge base. A performance profile is extracted from that and used to estimate the runtime. The authors also propose an alternative approach that uses neural nets. Fink (1997, 1998) computes the expected gain for time bounds based on past success times. The computed values are used to choose the algorithm and the time bound for running it. Brazdil and Soares (2000) compare algorithm rankings based on different past performance statistics. Similarly, Leite, Brazdil, Vanschoren, and Queiros (2010) maintain a ranking based on past performance. Cicirello and Smith (2005) propose a bandit problem model that governs the allocation of resources to each algorithm in the portfolio. Wang and Tropper (2007) also use a bandit model, but furthermore evaluate a Q-learning approach, where in addition to bandit model rewards, the states of the system are taken into account. Gomes and Selman (1997a), Wu and van Beek (2007), Gerevini et al. (2009) use the past performanc... |

26 |
Learning effective search heuristics.
- Langley
- 1983
(Show Context)
Citation Context ...es. Elsayed and Michel (2010, 2011) use a similar idea to construct search strategies for solving constraint problems. Fukunaga (2002, 2008) proposes CLASS, which combines heuristic building blocks to form composite heuristics for solving SAT problems. In these approaches, there is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower level components. Closely related is the concept of specialising generic building blocks for the problem to solve. This approach is taken in the SAGE system (Strategy Acquisition Governed by Experimentation) (Langley, 1983b, 1983a). It starts with a set of general operators that can be applied to a search state. These operators are refined by making the preconditions more specific based on their utility for finding a solution. The Multi-tac (Multi-tactic Analytic Compiler) system (Minton, 1993b, 1993a, 1996) specialises a set of generic heuristics for the constraint problem to solve. There can be complex restrictions on how the building blocks are combined. RT-Syn (Smith & Setliff, 1992) for example uses a preprocessing step to determine the possible combinations of algorithms and data structures to solve a sof... |

26 | Empirical hardness models: Methodology and a case study on combinatorial auctions. - Leyton-Brown, Nudelman, et al. - 2009 |

25 | Programming by optimization.
- Hoos
- 2012
(Show Context)
Citation Context ... so that they complement each other. Errors made by one model are corrected by another. Ensembles can be engineered by techniques such as bagging (Breiman, 1996) and boosting (Schapire, 1990). Bauer and Kohavi (1999), Opitz and Maclin (1999) present studies that compare bagging and boosting empirically. Dietterich (2000) provides explanations for why ensembles can perform better than individual algorithms. There is increasing interest in the integration of Algorithm Selection techniques with programming language paradigms (e.g. Ansel, Chan, Wong, Olszewski, Zhao, Edelman, & Amarasinghe, 2009; Hoos, 2012). While these issues are sufficiently relevant to be mentioned here, exploring them in detail is outside the scope of the paper. Similarly, technical issues arising from the computation, storage and application of performance models, the integration of Algorithm Selection techniques into complex systems, the execution of choices and the collection of experimental data to facilitate Algorithm Selection are not surveyed here. 1.3 Terminology Algorithm Selection is a widely applicable concept and as such has cropped up frequently in various lines of research. Often, different terminologies are us... |

24 | A portfolio solver for answer set programming: preliminary report.
- Gebser, Kaminski, et al.
- 2011
(Show Context)
Citation Context ...oblems such as SAT (Xu et al., 2008; Lagoudakis & Littman, 2001; Silverthorn & Miikkulainen, 2010), constraints (Minton, 1996; Epstein et al., 2002; O’Mahony et al., 2008), Mixed Integer Programming (Xu, Hutter, Hoos, & Leyton-Brown, 2011), Quantified Boolean Formulae (Pulina & Tacchella, 2009; Stern et al., 2010), planning (Carbonell et al., 1991; Howe et al., 1999; Vrakas et al., 2003), scheduling (Beck & Fox, 2000; Beck & Freuder, 2004; Cicirello & Smith, 2005), combinatorial auctions (Leyton-Brown et al., 2002; Gebruers et al., 2004; Gagliolo & Schmidhuber, 2006b), Answer Set Programming (Gebser, Kaminski, Kaufmann, Schaub, Schneider, & Ziller, 2011), the Travelling Salesperson Problem (Fukunaga, 2000) and general search algorithms (Langley, 1983b; Cook & Varnell, 1997; Lobjois & Lemaıtre, 1998). Other domains include Machine Learning (Soares et al., 2004; Leite et al., 2010), the most probable explanation problem (Guo & Hsu, 2004), parallel reduction algorithms Yu et al. (2004), Yu and Rauchwerger (2006) and simulation (Wang & Tropper, 2007; Ewald et al., 2010). It should be noted that a significant part of Machine Learning research is concerned with developing Algorithm Selection techniques; the publications listed in this paragraph a... |

23 | Learning dynamic algorithm portfolios.
- Gagliolo, Schmidhuber
- 2006
(Show Context)
Citation Context ...onstraint problems. Vrakas et al. (2003) learn rules automatically, but then filter them manually. A more common approach today is to automatically learn performance models using Machine Learning on training data. The portfolio algorithms are run on a set of representative problems and based on these experimental results, performance models are built. This approach is used by Xu et al. (2008), Pulina and Tacchella (2007), O’Mahony et al. (2008), Kadioglu et al. (2010), Guerri and Milano (2004), to name but a few examples. A drawback of this approach is that the training time is usually large. Gagliolo and Schmidhuber (2006a) investigate ways of mitigating this problem by using censored sampling, which introduces an upper bound on the runtime of each experiment in the training phase. Kotthoff, Gent, and Miguel (2012) also investigate censored sampling where not all algorithms are run on all problems in the training phase. Their results show that censored sampling may not have a significant effect on the performance of the learned model. Models can also be built without a separate training phase, but while the problem is solved. This approach is used by Gagliolo and Schmidhuber (2006b), Armstrong et al. (2006) fo... |

23 | Using CBR to select solution strategies in constraint programming.
- Gebruers, Hnich, et al.
- 2005
(Show Context)
Citation Context ... Kotthoff of 19 different Machine Learning classifiers on an Algorithm Selection problem in constraint programming. The investigation is extended to include more Machine Learning algorithms as well as different performance models and more problem domains in Kotthoff et al. (2012). They identify several Machine Learning algorithms that show particularly good performance across different problem domains, namely linear regression and alternating decision trees. They do not consider issues such as how easy the models are to understand or how efficient they are to compute. Only Guo and Hsu (2004), Gebruers et al. (2005), Hough and Williams (2006), Pulina and Tacchella (2007), Silverthorn and Miikkulainen (2010), Gent et al. (2010b), Kotthoff et al. (2012) quantify the differences in performance of the methods they used. The other comparisons give only qualitative evidence. Not all comparisons choose one of the approaches over the other or provide sufficient detail to enable the reader to do so. In cases where a particular technique is chosen, performance is often not the only selection criterion. In particular, the ability to understand a learned model plays a significant role. 4.2 Types of predictions The w... |

22 |
ODEXPERT: an expert system to select numerical solvers for initial value ODE systems.
- Kamel, Enright, et al.
- 1993
(Show Context)
Citation Context ...0a) explicitly exclude features that are expensive to compute. 6. Application domains The approaches for solving the Algorithm Selection Problem that have been surveyed here are usually not specific to a particular application domain, within combinatorial search problems or otherwise. Nevertheless this survey would not be complete without a brief exposition of the various contexts in which Algorithm Selection techniques have been applied. Over the years, Algorithm Selection systems have been used in many different application domains. These range from Mathematics, e.g. differential equations (Kamel, Enright, & Ma, 1993; Weerawarana et al., 1996), linear algebra (Demmel et al., 2005) and linear systems (Bhowmick et al., 2006; Kuefler & Chen, 2008), to the selection of algorithms and data structures in software design (Smith & Setliff, 1992; Cahill, 1994; Brewer, 1995; Wilson et al., 2000). A very common application domain are combinatorial search problems such as SAT (Xu et al., 2008; Lagoudakis & Littman, 2001; Silverthorn & Miikkulainen, 2010), constraints (Minton, 1996; Epstein et al., 2002; O’Mahony et al., 2008), Mixed Integer Programming (Xu, Hutter, Hoos, & Leyton-Brown, 2011), Quantified Boolean Form... |

21 |
A context for constraint satisfaction problem formulation selection.
- Borrett, Tsang
- 2001
(Show Context)
Citation Context ...as the runtime performance. Brazdil and Soares (2000), Soares, Brazdil, and Kuba (2004), Leite et al. (2010) produce rankings of the portfolio algorithms. Kotthoff et al. (2012) use statistical relational learning to directly predict the ranking instead of deriving it from other predictions. Howe et al. (1999), Gagliolo et al. (2004), Gagliolo and Schmidhuber (2006b), Roberts 20 Algorithm Selection for Search: A survey and Howe (2006), O’Mahony et al. (2008) predict resource allocations for the algorithms in the portfolios. Gebruers et al. (2005), Little, Gebruers, Bridge, and Freuder (2002), Borrett and Tsang (2001) consider selecting the most appropriate formulation of a constraint problem. Smith and Setliff (1992), Brewer (1995), Wilson et al. (2000), Balasubramaniam et al. (2012) select algorithms and data structures to be used in a software system. Some types of predictions require online approaches that make decisions during search. Borrett et al. (1996), Sakkout et al. (1996), Carchrae and Beck (2004), Armstrong et al. (2006) predict when to switch the algorithm used to solve a problem. Horvitz et al. (2001) predict whether to restart an algorithm. Lagoudakis and Littman (2000, 2001) predict the co... |

20 |
A Meta-Heuristic factory for vehicle routing problems.
- Caseau, Laburthe, et al.
- 1999
(Show Context)
Citation Context ...ad actions based on computation time. Kuefler and Chen (2008) follow a very similar approach that also takes success or failure into account. Carchrae and Beck (2004, 2005) monitor the solution quality during search. They decide whether to switch the current algorithm based on this by changing the allocation of resources. Wei et al. (2008) monitor a feature that is specific to their application domain, the distribution of clause weights in SAT, during search and use it to decide whether to switch a heuristic. Stergiou (2009) monitors propagation events in a constraint solver to a similar aim. Caseau et al. (1999) evaluate the performance of candidate algorithms in terms of number of calls to a specific high-level procedure. They note that in contrast to using the runtime, their approach is machine-independent. 5.3 Feature selection The features used for learning the Algorithm Selection model are crucial to its success. Uninformative features might prevent the model learner from recognising the real relation between problem and performance or the most important feature might be missing. Many researchers have recognised this problem. Howe et al. (1999) manually select the most important features. They f... |

20 | An analytic learning system for specializing heuristics. In
- Minton
- 1993
(Show Context)
Citation Context ...re is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower level components. Closely related is the concept of specialising generic building blocks for the problem to solve. This approach is taken in the SAGE system (Strategy Acquisition Governed by Experimentation) (Langley, 1983b, 1983a). It starts with a set of general operators that can be applied to a search state. These operators are refined by making the preconditions more specific based on their utility for finding a solution. The Multi-tac (Multi-tactic Analytic Compiler) system (Minton, 1993b, 1993a, 1996) specialises a set of generic heuristics for the constraint problem to solve. There can be complex restrictions on how the building blocks are combined. RT-Syn (Smith & Setliff, 1992) for example uses a preprocessing step to determine the possible combinations of algorithms and data structures to solve a software specification problem and then selects the most appropriate combination using simulated annealing. Balasubramaniam, Gent, Jefferson, Kotthoff, Miguel, and Nightingale (2012) model the construction of a constraint solver from components as a constraint problem whose solu... |

20 | Integrating heuristics for constraint satisfaction problems: A case study.
- Minton
- 1993
(Show Context)
Citation Context ...re is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower level components. Closely related is the concept of specialising generic building blocks for the problem to solve. This approach is taken in the SAGE system (Strategy Acquisition Governed by Experimentation) (Langley, 1983b, 1983a). It starts with a set of general operators that can be applied to a search state. These operators are refined by making the preconditions more specific based on their utility for finding a solution. The Multi-tac (Multi-tactic Analytic Compiler) system (Minton, 1993b, 1993a, 1996) specialises a set of generic heuristics for the constraint problem to solve. There can be complex restrictions on how the building blocks are combined. RT-Syn (Smith & Setliff, 1992) for example uses a preprocessing step to determine the possible combinations of algorithms and data structures to solve a software specification problem and then selects the most appropriate combination using simulated annealing. Balasubramaniam, Gent, Jefferson, Kotthoff, Miguel, and Nightingale (2012) model the construction of a constraint solver from components as a constraint problem whose solu... |

19 | Selecting the right heuristic algorithm: Runtime performance predictors.
- Allen, Minton
- 1996
(Show Context)
Citation Context ...ch method of Machine Learning. The earliest approaches also spoke of hybrid approaches (e.g. Utgoff, 1988). Aha (1992) proposes rules for selecting a Machine Learning algorithm that take the characteristics of a data set into account. He uses the term meta-learning. Brodley (1993) introduces the notion of selective superiority. This concept refers to a particular algorithm being best on some, but not all tasks. In addition to the many terms used for the process of Algorithm Selection, researchers have also used different terminology for the models of what Rice calls performance measure space. Allen and Minton (1996) call them runtime performance predictors. Leyton-Brown, Nudelman, and Shoham (2002), Hutter, Hamadi, Hoos, and Leyton-Brown (2006), Xu, Hoos, and Leyton-Brown (2007), Leyton-Brown, Nudelman, and Shoham (2009) coined the term Empirical Hardness model. This stresses the reliance on empirical data to create these models and introduces the notion of hardness of a problem. The concept of hardness takes into account all performance considerations and does not restrict itself to, for example, runtime performance. In practice however, the described empirical hardness models only take runtime performa... |

18 | Low-Knowledge algorithm control.
- Carchrae, Beck
- 2004
(Show Context)
Citation Context ...imilarly, Gent et al. (2010a) show their final decision tree in the paper. Some approaches learn probabilistic models that take uncertainty and variability into account. Gratch and DeJong (1992) use a probabilistic model to learn control rules. The probabilities for candidate rules being beneficial are evaluated and updated on a training set until a threshold is reached. This methodology is used to avoid having to evaluate candidate rules on larger training sets, which would show their utility more clearly but be more expensive. Demmel et al. (2005) learn multivariate Bayesian decision rules. Carchrae and Beck (2004) learn a Bayesian classifier to predict the best algorithm after a certain amount of time. Stern, Samulowitz, Herbrich, Graepel, Pulina, and Tacchella (2010) learn Bayesian models that incorporate collaborative filtering. Domshlak, Karpas, and Markovitch (2010) learn decision rules using naıve Bayes classifiers. Lagoudakis and Littman (2000), Petrik (2005) learn performance models based on Markov Decision Processes. Kotthoff et al. (2012) use statistical relational learning to predict the ranking of the algorithms in the portfolio on a particular problem. None of these approaches make explici... |

18 | To max or not to max: Online learning for speeding up optimal planning. - Domshlak, Karpas, et al. - 2010 |

17 | How to get a free lunch: A simple cost model for machine learning applications.
- Domingos
- 1998
(Show Context)
Citation Context ...e right algorithms on one part of the problem space, wrong decisions will be made on other parts, leading to a loss of performance. On average over all problems, the performance achieved by an Algorithm Selection meta-algorithm will be the same as that of all other algorithms. The NFL theorems are the source of some controversy however. Among the researchers to doubt their applicability is the first proponent of the Algorithm Selection Problem (Rice & Ramakrishnan, 1999). Several other publications show that the assumptions underlying the NFL may not be satisfied (Rao, Gordon, & Spears, 1995; Domingos, 1998). In particular, the distribution of the best algorithms from the portfolio to problems is not random – it is certainly true that certain algorithms are the best on a much larger number of problems than others. A detailed assessment of the applicability of the NFL theorems to the Algorithm Selection Problem is outside the scope of this paper. However, a review of the literature suggests that, if the theorems are applicable, the ramifications in practice may not be significant. Most of the many publications surveyed here do achieve performance improvements across a range of different problems u... |

16 | An evaluation of machine learning in algorithm selection for search problems.
- Kotthoff, Gent, et al.
- 2012
(Show Context)
Citation Context ... larger training sets, which would show their utility more clearly but be more expensive. Demmel et al. (2005) learn multivariate Bayesian decision rules. Carchrae and Beck (2004) learn a Bayesian classifier to predict the best algorithm after a certain amount of time. Stern, Samulowitz, Herbrich, Graepel, Pulina, and Tacchella (2010) learn Bayesian models that incorporate collaborative filtering. Domshlak, Karpas, and Markovitch (2010) learn decision rules using naıve Bayes classifiers. Lagoudakis and Littman (2000), Petrik (2005) learn performance models based on Markov Decision Processes. Kotthoff et al. (2012) use statistical relational learning to predict the ranking of the algorithms in the portfolio on a particular problem. None of these approaches make explicit use of the uncertainty attached to a decision though. Other approaches include support vector machines (Hough & Williams, 2006; Arbelaez et al., 2009), reinforcement learning (Armstrong et al., 2006), neural networks (Gagliolo & Schmidhuber, 2005), decision tree ensembles (Hough & Williams, 2006), ensembles of general classification algorithms (Kotthoff, Miguel, & Nightingale, 2010), boosting (Bhowmick et al., 2006), hybrid approaches th... |

14 | Maximizing the benefits of parallel search using machine learning.
- Cook, Varnell
- 1997
(Show Context)
Citation Context ...different mapping with the same performance or an even better mapping is secondary. While it is easy to determine the theoretically best selection mapping on a set of given problems, casting this mapping into a generalisable form that will give good performance on new problems or even into a form that can be used in practice is hard. Indeed, Guo (2003) shows that the Algorithm Selection Problem in general is undecidable. It may be better to choose a mapping that generalises well rather than the one with the best performance. Other considerations can be involved as well. Guo and Hsu (2004) and Cook and Varnell (1997) compare different Algorithm 2 Algorithm Selection for Search: A survey x ∈ P Problem space f(x) ∈ F = Rm Feature space A ∈ A Algorithm space p ∈ Rn Performance measure space ‖p‖ = Algorithm performance Feature extraction S(f(x)) Selection mapping p(A,x) Performance mapping Figure 2: Refined model for the Algorithm Selection Problem with problem features (Rice, 1976). selection models and select not the one with the best performance, but one with good performance that is also easy to understand, for example. Vrakas, Tsoumakas, Bassiliades, and Vlahavas (2003) select their method of choice for ... |

13 | Making choices using structure at the instance level within a case based reasoning framework.
- Gebruers, Guerri, et al.
- 2004
(Show Context)
Citation Context ...number of clauses/constraints/goals of a particular type (for example the number of alldifferent constraints, Gent et al., 2010b), • ratios of several of the above features and summary statistics. Such features are used for example in O’Mahony et al. (2008), Pulina and Tacchella (2007), Weerawarana et al. (1996), Howe et al. (1999), Xu et al. (2008). Other sources of features include the generator that produced the problem to be solved (Horvitz et al., 2001), the runtime environment (Armstrong et al., 2006), structures derived from the problem such as the primal graph of a constraint problem (Gebruers et al., 2004; Guerri & Milano, 2004; Gent et al., 2010a), specific parts of the problem model such as variables (Epstein & Freuder, 2001), the algorithms in the portfolio themselves (Hough & Williams, 2006) or the domain of the problem to be solved (Carbonell et al., 1991), Gerevini et al. (2009) rely on the problem domain as the only problem-specific feature and select based on past performance data for the particular domain. Beck and Fox (2000) consider not only the values of properties of a problem, but the changes of those values while the problem is being solved. Smith and Setliff (1992) consider fea... |

13 | Learning when to use lazy learning in constraint solving.
- Gent, Jefferson, et al.
- 2010
(Show Context)
Citation Context ...he algorithm that was actually selected. Predicting the time required to analyse a problem is a closely related idea. If the predicted required analysis time is too high, a default algorithm with reasonable performance is chosen and run on the problem. This technique is particularly important in cases where the problem is hard to analyse, but easy to solve. As some systems use information that comes from exploring part of the search space (cf. Section 5), this is a very relevant concern in practice. On some problems, even probing just a tiny part of the search space may take a very long time. Gent et al. (2010a), Gent, Kotthoff, Miguel, and Nightingale (2010b) report that using the misclassification penalty as a weight for the individual problems during training improves the quality of the predictions. The misclassification penalty quantifies the “badness” of a wrong prediction; in this case as the additional time required to solve a problem. If an algorithm was chosen that is only slightly worse than the best one, it has less impact than choosing an algorithm that is orders of magnitude worse. Using the penalty during training is a way of guiding the learned model towards the problems where the po... |

12 | Simple rules for low-knowledge algorithm selection.
- Beck, Freuder
- 2004
(Show Context)
Citation Context ...ar algorithm is good, i.e. leading towards a solution, or bad, i.e. straying from the path to a solution if the solution is known or revisiting an earlier search state if the solution is not known. Gomes and Selman (1997a, 2001) use the runtime distributions of algorithms over the size of a problem, as measured by the number of backtracks. Fink (1998) uses the past success times of an algorithm as candidate time bounds on new problems. Brazdil and Soares (2000) do not consider the runtime, but the error rate of algorithms. Gerevini et al. (2009) use both computation time and solution quality. Beck and Freuder (2004), Carchrae and Beck (2004, 2005) evaluate the performance also during search. They explicitly focus on features that do not require a lot of domain knowledge. Beck and Freuder (2004) note that, “While existing algorithm selection techniques have shown impressive results, their knowledge-intensive nature means that domain and algorithm expertise is necessary to develop the models. The overall requirement for expertise has not been reduced: it has been shifted from algorithm selection to predictive model building.” They do, like several other approaches, assume anytime algorithms – after search ... |

12 |
Application of machine learning in selecting sparse linear solvers.
- Bhowmick, Eijkhout, et al.
- 2006
(Show Context)
Citation Context ...v Decision Processes. Kotthoff et al. (2012) use statistical relational learning to predict the ranking of the algorithms in the portfolio on a particular problem. None of these approaches make explicit use of the uncertainty attached to a decision though. Other approaches include support vector machines (Hough & Williams, 2006; Arbelaez et al., 2009), reinforcement learning (Armstrong et al., 2006), neural networks (Gagliolo & Schmidhuber, 2005), decision tree ensembles (Hough & Williams, 2006), ensembles of general classification algorithms (Kotthoff, Miguel, & Nightingale, 2010), boosting (Bhowmick et al., 2006), hybrid approaches that combine regression and classification (Kotthoff, 2012a), multinomial logistic regression (Samulowitz & Memisevic, 2007), self-organising maps (Smith-Miles, 2008b) and clustering (Stamatatos & Stergiou, 2009; Stergiou, 2009; Kadioglu et al., 2010). Sayag et al. (2006), Streeter et al. (2007a) compute schedules for running the algorithms in the portfolio based on a statistical model of the problem instance distribution and performance data for the algorithms. This is not an exhaustive list, but focuses on the most prominent approaches and publications. Within a single fa... |

10 | Collaborative learning for constraint solving.
- Epstein, Freuder
- 2001
(Show Context)
Citation Context ...em to be solved. The algorithm space A changes with each problem and is a subspace of the potentially infinite super algorithm space A′. This space contains all possible (hypothetical) algorithms that could be used to solve problems from the problem space. In static portfolios, the algorithms in the portfolio are 9 Kotthoff selected from A′ once either manually by the designer of the portfolio or automatically based on empirical results from training data. One approach is to build a portfolio by combining algorithmic building blocks. An example of this is the Adaptive Constraint Engine (ACE) (Epstein & Freuder, 2001; Epstein, Freuder, Wallace, Morozov, & Samuels, 2002). The building blocks are so-called advisors, which characterise variables of the constraint problem and give recommendations as to which one to process next. ACE combines these advisors into more complex ones. Elsayed and Michel (2010, 2011) use a similar idea to construct search strategies for solving constraint problems. Fukunaga (2002, 2008) proposes CLASS, which combines heuristic building blocks to form composite heuristics for solving SAT problems. In these approaches, there is no strong notion of a portfolio – the algorithm or strat... |

10 |
Automatic Algorithm Selection for Complex Simulation Problems.
- Ewald
- 2010
(Show Context)
Citation Context ...omplete or deterministic, i.e. they are not guaranteed to find a solution if it exists or to always make the same decision under the same circumstances. The nature of heuristics makes them particularly amenable to Algorithm Selection – choosing a heuristic manually is difficult even for experts, but choosing the correct one can improve performance significantly. Several doctoral dissertations with related work chapters that survey the literature on Algorithm Selection have been produced. Examples of the more recent ones include Streeter (2007), Hutter (2009), Carchrae (2009), Gagliolo (2010), Ewald (2010), Kotthoff (2012b), Malitsky (2012). SmithMiles (2008a) presents a survey with similar aims. It looks at the Algorithm Selection Problem from the Machine Learning point of view and focuses on seeing Algorithm Selection as a learning problem. As a consequence, great detail is given for aspects that are relevant to Machine Learning. In this paper, we take a more practical point of view and focus on techniques that facilitate and implement Algorithm Selection systems. We are furthermore able to take more recent work in this fast-moving area into account. In contrast to most other work surveying A... |

10 | A neural network model for Inter-Problem adaptive online time allocation.
- Gagliolo, Schmidhuber
- 2005
(Show Context)
Citation Context ...ithms for recursively solving sub-problems. The PRODIGY system (Carbonell, Etzioni, Gil, Joseph, Knoblock, Minton, & Veloso, 1991) selects the next operator to apply in order to reach the goal state of a planning problem at each node in the search tree. Similarly, Langley (1983a) learn weights for operators that can be applied at each search state and select from among them accordingly. Most approaches rely on an offline element that makes a decision before search starts. In the case of recursive calls, this is no different from making a decision during search however. Gagliolo et al. (2004), Gagliolo and Schmidhuber (2005, 2006b) on the other hand learn the Algorithm Selection model only dynamically while the problem is being solved. Initially, all algorithms in the portfolio are allocated the same (small) time slice. As search progresses, the allocation strategy is updated, giving more resources to algorithms that have exhibited better performance. The expected fastest algorithm receives half of the total time, the next best algorithm half of the remaining time and so on. Armstrong, Christen, McCreath, and Rendell (2006) also rely exclusively on a selection model trained online in a similar fashion. They eval... |

10 | Algorithm Selection for Sorting and Probabilistic Inference: A Machine LearningBased Approach.
- Guo
- 2003
(Show Context)
Citation Context ...tion of functions. The questions for existence and uniqueness of a best selection mapping are usually irrelevant in practice. As long as a good performance mapping is found and improves upon the current state of the art, the question of whether there is a different mapping with the same performance or an even better mapping is secondary. While it is easy to determine the theoretically best selection mapping on a set of given problems, casting this mapping into a generalisable form that will give good performance on new problems or even into a form that can be used in practice is hard. Indeed, Guo (2003) shows that the Algorithm Selection Problem in general is undecidable. It may be better to choose a mapping that generalises well rather than the one with the best performance. Other considerations can be involved as well. Guo and Hsu (2004) and Cook and Varnell (1997) compare different Algorithm 2 Algorithm Selection for Search: A survey x ∈ P Problem space f(x) ∈ F = Rm Feature space A ∈ A Algorithm space p ∈ Rn Performance measure space ‖p‖ = Algorithm performance Feature extraction S(f(x)) Selection mapping p(A,x) Performance mapping Figure 2: Refined model for the Algorithm Selection Prob... |

10 | A Learning-Based algorithm selection meta-reasoner for the RealTime MPE problem.
- Guo, Hsu
- 2004
(Show Context)
Citation Context ... of whether there is a different mapping with the same performance or an even better mapping is secondary. While it is easy to determine the theoretically best selection mapping on a set of given problems, casting this mapping into a generalisable form that will give good performance on new problems or even into a form that can be used in practice is hard. Indeed, Guo (2003) shows that the Algorithm Selection Problem in general is undecidable. It may be better to choose a mapping that generalises well rather than the one with the best performance. Other considerations can be involved as well. Guo and Hsu (2004) and Cook and Varnell (1997) compare different Algorithm 2 Algorithm Selection for Search: A survey x ∈ P Problem space f(x) ∈ F = Rm Feature space A ∈ A Algorithm space p ∈ Rn Performance measure space ‖p‖ = Algorithm performance Feature extraction S(f(x)) Selection mapping p(A,x) Performance mapping Figure 2: Refined model for the Algorithm Selection Problem with problem features (Rice, 1976). selection models and select not the one with the best performance, but one with good performance that is also easy to understand, for example. Vrakas, Tsoumakas, Bassiliades, and Vlahavas (2003) select... |

10 | Parallel algorithm configuration.
- Hutter, Hoos, et al.
- 2012
(Show Context)
Citation Context ...e a racing approach to avoid having to run all generated configurations to completion. They also note that one of the advantages of the genetic algorithm approach is that it is inherently parallel. Both of these approaches are capable of tuning algorithms with a large number of parameters and possible values as well as taking interactions between parameters into account. They are used in practice in the Algorithm Selection systems Hydra and ISAC, respectively. In both cases, they are only used to construct static portfolios however. More recent approaches focus on exploiting parallelism (e.g. Hutter, Hoos, & Leyton-Brown, 2012). Dynamic portfolios are in general a more fruitful area for Algorithm Selection research because of the large space of possible decisions. Static portfolios are usually relatively small and the decision space is amenable for human exploration. This is not a feasible approach for dynamic portfolios though. Minton (1996) notes that “Multi-tac turned out to have an unexpected advantage in this arena, due to the complexity of the task. Unlike our human subjects, Multi-tac experimented with a wide variety of combinations of heuristics. Our human subjects rarely had the inclination or patience to ... |

10 | Capturing constraint programming experience: A Case-Based approach. - Little, Gebruers, et al. - 2002 |

10 | Non-model-based algorithm portfolios for SAT. - Malitsky, Sabharwal, et al. - 2011 |

10 | Instance-Based selection of policies for SAT solvers. - Nikolic, Maric, et al. - 2009 |

8 |
Algorithm portfolio selection as a bandit problem with unbounded losses.
- Gagliolo, Schmidhuber
- 2011
(Show Context)
Citation Context ...this paragraph are the most relevant that use the specific techniques and framework surveyed here. Some publications consider more than one application domain. Stern et al. (2010) choose the best algorithm for Quantified Boolean Formulae and combinatorial auctions. Allen and Minton (1996), Kroer and Malitsky (2011) look at SAT and constraints. Gomes and Selman (2001) consider SAT and Mixed Integer Programming. In addition to these two domains, Kadioglu et al. (2010) also investigate set covering problems. Streeter and Smith (2008) apply their approach to SAT, Integer Programming and planning. Gagliolo and Schmidhuber (2011), Kotthoff et al. (2012), Kotthoff (2012a) compare the performance across Algorithm Selection problems from constraints, Quantified Boolean Formulae and SAT. In most cases, researchers take some steps to adapt their approaches to the application domain. This is usually done by using domain-specific features, such as the number of constraints and variables in constraint programming. In principle, this is not a limitation of the proposed techniques as 25 Kotthoff those features can be exchanged for ones that are applicable in other application domains. While the overall approach remains valid, t... |

8 | Adaptive online time allocation to search algorithms.
- Gagliolo, Zhumatiy, et al.
- 2004
(Show Context)
Citation Context ...(2007) also select algorithms for recursively solving sub-problems. The PRODIGY system (Carbonell, Etzioni, Gil, Joseph, Knoblock, Minton, & Veloso, 1991) selects the next operator to apply in order to reach the goal state of a planning problem at each node in the search tree. Similarly, Langley (1983a) learn weights for operators that can be applied at each search state and select from among them accordingly. Most approaches rely on an offline element that makes a decision before search starts. In the case of recursive calls, this is no different from making a decision during search however. Gagliolo et al. (2004), Gagliolo and Schmidhuber (2005, 2006b) on the other hand learn the Algorithm Selection model only dynamically while the problem is being solved. Initially, all algorithms in the portfolio are allocated the same (small) time slice. As search progresses, the allocation strategy is updated, giving more resources to algorithms that have exhibited better performance. The expected fastest algorithm receives half of the total time, the next best algorithm half of the remaining time and so on. Armstrong, Christen, McCreath, and Rendell (2006) also rely exclusively on a selection model trained online... |

8 |
DVRP: a hard dynamic combinatorial optimisation problem tackled by an evolutionary hyper-heuristic.
- Garrido, Riff
- 2010
(Show Context)
Citation Context ...at is only slightly worse than the best one, it has less impact than choosing an algorithm that is orders of magnitude worse. Using the penalty during training is a way of guiding the learned model towards the problems where the potential performance improvement is large. There are many different approaches to how portfolio selectors operate. The selector is not necessarily an explicit part of the system. Minton (1996) compiles the Algorithm Selection system into a Lisp programme for solving the original constraint problem. The selection rules are part of the programme logic. Fukunaga (2008), Garrido and Riff (2010) evolve selectors and combinators of heuristic building blocks using genetic algorithms. The selector is implicit in the evolved programme. 4.1 Performance models The way the selector operates is closely linked to the way the performance model of the algorithms in the portfolio is built. In early approaches, the performance model was usually not learned but given in the form of human expert knowledge. Borrett et al. (1996), Sakkout et al. (1996) use handcrafted rules to determine whether to switch the algorithm during solving. Allen and Minton (1996) also have hand-crafted rules, but estimate ... |

8 | Ensemble classification for constraint solver configuration.
- Kotthoff, Miguel, et al.
- 2010
(Show Context)
Citation Context ...rik (2005) learn performance models based on Markov Decision Processes. Kotthoff et al. (2012) use statistical relational learning to predict the ranking of the algorithms in the portfolio on a particular problem. None of these approaches make explicit use of the uncertainty attached to a decision though. Other approaches include support vector machines (Hough & Williams, 2006; Arbelaez et al., 2009), reinforcement learning (Armstrong et al., 2006), neural networks (Gagliolo & Schmidhuber, 2005), decision tree ensembles (Hough & Williams, 2006), ensembles of general classification algorithms (Kotthoff, Miguel, & Nightingale, 2010), boosting (Bhowmick et al., 2006), hybrid approaches that combine regression and classification (Kotthoff, 2012a), multinomial logistic regression (Samulowitz & Memisevic, 2007), self-organising maps (Smith-Miles, 2008b) and clustering (Stamatatos & Stergiou, 2009; Stergiou, 2009; Kadioglu et al., 2010). Sayag et al. (2006), Streeter et al. (2007a) compute schedules for running the algorithms in the portfolio based on a statistical model of the problem instance distribution and performance data for the algorithms. This is not an exhaustive list, but focuses on the most prominent approaches a... |

8 | Learning search strategies through discrimination..
- Langley
- 1983
(Show Context)
Citation Context ...es. Elsayed and Michel (2010, 2011) use a similar idea to construct search strategies for solving constraint problems. Fukunaga (2002, 2008) proposes CLASS, which combines heuristic building blocks to form composite heuristics for solving SAT problems. In these approaches, there is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower level components. Closely related is the concept of specialising generic building blocks for the problem to solve. This approach is taken in the SAGE system (Strategy Acquisition Governed by Experimentation) (Langley, 1983b, 1983a). It starts with a set of general operators that can be applied to a search state. These operators are refined by making the preconditions more specific based on their utility for finding a solution. The Multi-tac (Multi-tactic Analytic Compiler) system (Minton, 1993b, 1993a, 1996) specialises a set of generic heuristics for the constraint problem to solve. There can be complex restrictions on how the building blocks are combined. RT-Syn (Smith & Setliff, 1992) for example uses a preprocessing step to determine the possible combinations of algorithms and data structures to solve a sof... |

7 |
Impact of censored sampling on the performance of restart strategies.
- Gagliolo, Schmidhuber
- 2006
(Show Context)
Citation Context ...onstraint problems. Vrakas et al. (2003) learn rules automatically, but then filter them manually. A more common approach today is to automatically learn performance models using Machine Learning on training data. The portfolio algorithms are run on a set of representative problems and based on these experimental results, performance models are built. This approach is used by Xu et al. (2008), Pulina and Tacchella (2007), O’Mahony et al. (2008), Kadioglu et al. (2010), Guerri and Milano (2004), to name but a few examples. A drawback of this approach is that the training time is usually large. Gagliolo and Schmidhuber (2006a) investigate ways of mitigating this problem by using censored sampling, which introduces an upper bound on the runtime of each experiment in the training phase. Kotthoff, Gent, and Miguel (2012) also investigate censored sampling where not all algorithms are run on all problems in the training phase. Their results show that censored sampling may not have a significant effect on the performance of the learned model. Models can also be built without a separate training phase, but while the problem is solved. This approach is used by Gagliolo and Schmidhuber (2006b), Armstrong et al. (2006) fo... |

7 | Practical aspects of algorithm portfolio design.
- Gomes, Selman
- 1997
(Show Context)
Citation Context ...in most cases, they can be measured directly. In Machine Learning, ensembles (Dietterich, 2000) are instances of algorithm portfolios. In fact, the only difference between algorithm portfolios and Machine Learning ensembles is the way in which its constituents are used. The idea of algorithm portfolios was first presented by Huberman, Lukose, and Hogg (1997). They describe a formal framework for the construction and application of algorithm portfolios and evaluate their approach on graph colouring problems. Within the Artificial Intelligence community, algorithm portfolios were popularised by Gomes and Selman (1997a, 1997b) and a subsequent extended investigation (Gomes & Selman, 2001). The technique itself however had been described under different names by other authors at about the same time in different contexts. Tsang et al. (1995) experimentally show for a selection of constraint satisfaction algorithms and heuristics that none is the best on all evaluated problems. They do not mention portfolios, but propose that future research should focus on identifying when particular algorithms and heuristics deliver the best performance. This implicitly assumes a portfolio to choose algorithms from. Allen a... |

7 | Restart strategy selection using machine learning techniques.
- Haim, Walsh
- 2009
(Show Context)
Citation Context ... long as it is correct relative to the other predictions. This is the approach that is implicitly assumed in Rice’s framework. The prediction is the performance mapping P (A, x) for an algorithm A ∈ A on a problem x ∈ P. Models for each algorithm in the portfolio are used for example by Xu et al. (2008), Howe et al. (1999), Allen and Minton (1996), Lobjois and Lemaıtre (1998), Gagliolo and Schmidhuber (2006b). A common way of doing this is to use regression to directly predict the performance of each algorithm. This is used by Xu et al. (2008), Howe et al. (1999), Leyton-Brown et al. (2002), Haim and Walsh (2009), Roberts and Howe (2007). The performance of the algorithms in the portfolio is evaluated on a set of training problems, and a relationship between the characteristics of a problem and the performance of an algorithm derived. This relationship usually has the form of a simple formula that is cheap to compute at runtime. Silverthorn and Miikkulainen (2010) on the other hand learn latent class models of unobserved variables to capture relationships between solvers, problems and run durations. Based on the predictions, the expected utility is computed and used to select an algorithm. Sillito (20... |

6 | Online heuristic selection in constraint programming. In
- Arbelaez, Hamadi, et al.
- 2009
(Show Context)
Citation Context ...ulina, and Tacchella (2010) learn Bayesian models that incorporate collaborative filtering. Domshlak, Karpas, and Markovitch (2010) learn decision rules using naıve Bayes classifiers. Lagoudakis and Littman (2000), Petrik (2005) learn performance models based on Markov Decision Processes. Kotthoff et al. (2012) use statistical relational learning to predict the ranking of the algorithms in the portfolio on a particular problem. None of these approaches make explicit use of the uncertainty attached to a decision though. Other approaches include support vector machines (Hough & Williams, 2006; Arbelaez et al., 2009), reinforcement learning (Armstrong et al., 2006), neural networks (Gagliolo & Schmidhuber, 2005), decision tree ensembles (Hough & Williams, 2006), ensembles of general classification algorithms (Kotthoff, Miguel, & Nightingale, 2010), boosting (Bhowmick et al., 2006), hybrid approaches that combine regression and classification (Kotthoff, 2012a), multinomial logistic regression (Samulowitz & Memisevic, 2007), self-organising maps (Smith-Miles, 2008b) and clustering (Stamatatos & Stergiou, 2009; Stergiou, 2009; Kadioglu et al., 2010). Sayag et al. (2006), Streeter et al. (2007a) compute sched... |

6 | Feature filtering for Instance-Specific algorithm configuration.
- Kroer, Malitsky
- 2011
(Show Context)
Citation Context ...se products. Kotthoff et al. (2012) automatically select the most important subset of the original set of features, but conclude that in practice the performance improvement compared to using all features is not significant. Wilson et al. (2000) use genetic algorithms to determine the importance of the individual features. Petrovic and Qu (2002) evaluate subsets of the features they use and learn weights for each of them. Roberts et al. (2008) consider using a single feature and automatic selection of a subset of all features. Guo 24 Algorithm Selection for Search: A survey and Hsu (2004) and Kroer and Malitsky (2011) also use techniques for automatically determining the most predictive subset of features. Kotthoff (2012a) compares the performance of ten different sets of features. It is not only important to use informative features, but also features that are cheap to compute. If the cost of computing the features and making the decision is too high, the performance improvement from selecting the best algorithm might be eroded. Xu et al. (2009) predict the feature computation time for a given problem and fall back to a default selection if it is too high to avoid this problem. They also limit the computa... |

5 | Statistical selection among Problem-Solving methods.
- Fink
- 1997
(Show Context)
Citation Context ...ition to having no way of mitigating wrong choices, often these will not even be detected. These approaches do not monitor the execution of the chosen algorithms to confirm that they conform with the expectations that led to them being chosen. Purely offline approaches are inherently vulnerable to bad choices. Their advantage however is that they only need to select an algorithm once and incur no overhead while the problem is being solved. Moving towards online systems, the next step is to monitor the execution of an algorithm or a schedule to be able to intervene if expectations are not met. Fink (1997, 1998) investigates setting a time bound for the algorithm that has been selected based on the predicted performance. If the 13 Kotthoff time bound is exceeded, the solution attempt is abandoned. More sophisticated systems furthermore adjust their selection if such a bound is exceeded. Borrett et al. (1996) try to detect behaviour during search that indicates that the algorithm is performing badly, for example visiting nodes in a subtree of the search that clearly do not lead to a solution. If such behaviour is detected, they propose switching the currently running algorithm according to a fi... |

5 | Towards distributed algorithm portfolios.
- Gagliolo, Schmidhuber
- 2008
(Show Context)
Citation Context ...his fixed amount of time. Similar to Gerevini et al. (2009), they simulate the performance of different allocations and select the best one based on the results of these simulations. (Fukunaga, 2000) estimates the performance of candidate allocations through bootstrap sampling. Gomes and Selman (1997a, 2001) also evaluate the performance of different candidate portfolios, but take into account how many algorithms can be run in parallel. They demonstrate that the optimal schedule (in this case the number of algorithms that are being run) changes as the number of available processors increases. Gagliolo and Schmidhuber (2008) investigate how to allocate resources to algorithms in the presence of multiple CPUs that allow to run more than one algorithm in parallel. Yun and Epstein (2012) craft portfolios with the specific aim of running the algorithms in parallel. Related research is concerned with the scheduling of restarts of stochastic algorithms – it also investigates the best way of allocating resources. The paper that introduced algorithm portfolios (Huberman et al., 1997) uses a portfolio of identical stochastic algorithms that are run with different random seeds. There is a large amount of research on how to... |

5 | Machine learning for constraint solver design - a case study for the alldifferent constraint.
- Gent, Kotthoff, et al.
- 2010
(Show Context)
Citation Context ...he algorithm that was actually selected. Predicting the time required to analyse a problem is a closely related idea. If the predicted required analysis time is too high, a default algorithm with reasonable performance is chosen and run on the problem. This technique is particularly important in cases where the problem is hard to analyse, but easy to solve. As some systems use information that comes from exploring part of the search space (cf. Section 5), this is a very relevant concern in practice. On some problems, even probing just a tiny part of the search space may take a very long time. Gent et al. (2010a), Gent, Kotthoff, Miguel, and Nightingale (2010b) report that using the misclassification penalty as a weight for the individual problems during training improves the quality of the predictions. The misclassification penalty quantifies the “badness” of a wrong prediction; in this case as the additional time required to solve a problem. If an algorithm was chosen that is only slightly worse than the best one, it has less impact than choosing an algorithm that is orders of magnitude worse. Using the penalty during training is a way of guiding the learned model towards the problems where the po... |

4 | An automated approach to generating efficient constraint solvers.
- Balasubramaniam, Gent, et al.
- 2012
(Show Context)
Citation Context ...et al., 2010). Hydra uses ParamILS (Hutter, Hoos, & Stutzle, 2007; Hutter, Hoos, Leyton-Brown, & Stutzle, 2009) to automatically tune algorithms in a SATzilla (Xu et al., 2008) portfolio. ISAC (Kadioglu et al., 2010) uses GGA (Ansotegui, Sellmann, & Tierney, 2009) to automatically tune algorithms for clusters of problem instances. Minton (1996) first enumerates all possible rule applications up to a certain time or size bound. Then, the most promising configuration is selected using beam search, a form of parallel hill climbing, that empirically evaluates the performance of each candidate. Balasubramaniam et al. (2012) use hill climbing to similarly identify the most efficient configuration for a constraint solver on a set of problems. Terashima-Marın, Ross, and Valenzuela-Rendon (1999), Fukunaga (2002) use genetic algorithms to evolve promising configurations. The systems described in the previous paragraph are only of limited suitability for dynamic algorithm portfolios. They either take a long time to find good configurations or are restricted in 10 Algorithm Selection for Search: A survey the number or type of parameters. Interactions between parameters are only taken into account in a limited way. Mo... |

4 | Towards Low-Cost, High-Accuracy classifiers for linear solver selection. - Bhowmick, Toth, et al. - 2009 |

4 | Combining multiple heuristics on discrete resources. - Bougeret, Dutot, et al. - 2009 |

4 | Modern machine learning for automatic optimization algorithm selection.
- Hough, Williams
- 2006
(Show Context)
Citation Context ...ental results with sequential Bayesian linear regression and Gaussian Process regression. Guo (2003), Guo and Hsu (2004) explore using decision trees, naıve Bayes rules, Bayesian networks and meta-learning techniques. They also chose the C4.5 decision tree inducer because it is one of the top performers and creates models that are easy to understand and quick to execute. Gebruers, Hnich, Bridge, and Freuder (2005) compare nearest neighbour classifiers, decision trees and statistical models. They show that a nearest neighbour classifier outperforms all the other approaches on their data sets. Hough and Williams (2006) use decision tree ensembles and support vector machines. Bhowmick et al. (2006) investigate alternating decision trees and various forms of boosting, while Pulina and Tacchella (2007) use decision trees, decision rules, logistic regression and nearest neighbour approaches. They do not explicitly choose one of these methods in the paper, but their Algorithm Selection system AQME uses a nearest neighbour classifier by default. Roberts and Howe (2007) use 32 different Machine Learning algorithms to predict the runtime of algorithms and probability of success. They attempt to provide explanations... |

4 |
On Algorithm Selection, with an Application to Combinatorial Search Problems.
- Kotthoff
- 2012
(Show Context)
Citation Context ...but conclude that in practice the performance improvement compared to using all features is not significant. Wilson et al. (2000) use genetic algorithms to determine the importance of the individual features. Petrovic and Qu (2002) evaluate subsets of the features they use and learn weights for each of them. Roberts et al. (2008) consider using a single feature and automatic selection of a subset of all features. Guo 24 Algorithm Selection for Search: A survey and Hsu (2004) and Kroer and Malitsky (2011) also use techniques for automatically determining the most predictive subset of features. Kotthoff (2012a) compares the performance of ten different sets of features. It is not only important to use informative features, but also features that are cheap to compute. If the cost of computing the features and making the decision is too high, the performance improvement from selecting the best algorithm might be eroded. Xu et al. (2009) predict the feature computation time for a given problem and fall back to a default selection if it is too high to avoid this problem. They also limit the computation time for the most expensive features as well as the total time allowed to compute features. Bhowmick... |

3 |
Synthesis of search algorithms from high-level CP models.
- Elsayed, Michel
- 2011
(Show Context)
Citation Context ...ithms in the portfolio are 9 Kotthoff selected from A′ once either manually by the designer of the portfolio or automatically based on empirical results from training data. One approach is to build a portfolio by combining algorithmic building blocks. An example of this is the Adaptive Constraint Engine (ACE) (Epstein & Freuder, 2001; Epstein, Freuder, Wallace, Morozov, & Samuels, 2002). The building blocks are so-called advisors, which characterise variables of the constraint problem and give recommendations as to which one to process next. ACE combines these advisors into more complex ones. Elsayed and Michel (2010, 2011) use a similar idea to construct search strategies for solving constraint problems. Fukunaga (2002, 2008) proposes CLASS, which combines heuristic building blocks to form composite heuristics for solving SAT problems. In these approaches, there is no strong notion of a portfolio – the algorithm or strategy used to solve a problem is assembled from lower level components. Closely related is the concept of specialising generic building blocks for the problem to solve. This approach is taken in the SAGE system (Strategy Acquisition Governed by Experimentation) (Langley, 1983b, 1983a). It s... |

3 |
Online Dynamic Algorithm Portfolios – Minimizing the computational cost of problem solving.
- Gagliolo
- 2010
(Show Context)
Citation Context ...not necessarily complete or deterministic, i.e. they are not guaranteed to find a solution if it exists or to always make the same decision under the same circumstances. The nature of heuristics makes them particularly amenable to Algorithm Selection – choosing a heuristic manually is difficult even for experts, but choosing the correct one can improve performance significantly. Several doctoral dissertations with related work chapters that survey the literature on Algorithm Selection have been produced. Examples of the more recent ones include Streeter (2007), Hutter (2009), Carchrae (2009), Gagliolo (2010), Ewald (2010), Kotthoff (2012b), Malitsky (2012). SmithMiles (2008a) presents a survey with similar aims. It looks at the Algorithm Selection Problem from the Machine Learning point of view and focuses on seeing Algorithm Selection as a learning problem. As a consequence, great detail is given for aspects that are relevant to Machine Learning. In this paper, we take a more practical point of view and focus on techniques that facilitate and implement Algorithm Selection systems. We are furthermore able to take more recent work in this fast-moving area into account. In contrast to most other wo... |

3 | On using reinforcement learning to solve sparse linear systems.
- Kuefler, Chen
- 2008
(Show Context)
Citation Context ...ic features are for example Borrett et al. (1996), Nareyek (2001), Stergiou (2009). There are many different features that can be computed during search. Minton (1996) determines how closely a generated heuristic approximates a generic target heuristic by checking the heuristic choices at random points during search. He selects the one with the closest match. Similarly, Nareyek (2001) learn how to select heuristics during the search process based on their performance. Armstrong et al. (2006) use an agent-based model that rewards good actions and punishes bad actions based on computation time. Kuefler and Chen (2008) follow a very similar approach that also takes success or failure into account. Carchrae and Beck (2004, 2005) monitor the solution quality during search. They decide whether to switch the current algorithm based on this by changing the allocation of resources. Wei et al. (2008) monitor a feature that is specific to their application domain, the distribution of clause weights in SAT, during search and use it to decide whether to switch a heuristic. Stergiou (2009) monitors propagation events in a constraint solver to a similar aim. Caseau et al. (1999) evaluate the performance of candidate al... |

2 |
Algorithm Selection for Search: A survey
- Armstrong, Christen, et al.
- 2006
(Show Context)
Citation Context ...iolo and Schmidhuber (2006a) investigate ways of mitigating this problem by using censored sampling, which introduces an upper bound on the runtime of each experiment in the training phase. Kotthoff, Gent, and Miguel (2012) also investigate censored sampling where not all algorithms are run on all problems in the training phase. Their results show that censored sampling may not have a significant effect on the performance of the learned model. Models can also be built without a separate training phase, but while the problem is solved. This approach is used by Gagliolo and Schmidhuber (2006b), Armstrong et al. (2006) for example. While this significantly reduces the time to build a system, it can mean that the result is less effective and efficient. At the beginning, when no performance models have been built, the decisions of the selector might be poor. Furthermore, creating and updating performance models why the problem is being solved incurs an overhead. The choice of Machine Learning technique is affected by the way the portfolio selector operates. Some techniques are more amenable to offline approaches (e.g. linear regression models used by Xu et al., 2008), while others lend themselves to online me... |

2 |
Knowledge-based algorithm construction for real-world engineering PDEs.
- Cahill
- 1994
(Show Context)
Citation Context ...ial search problems or otherwise. Nevertheless this survey would not be complete without a brief exposition of the various contexts in which Algorithm Selection techniques have been applied. Over the years, Algorithm Selection systems have been used in many different application domains. These range from Mathematics, e.g. differential equations (Kamel, Enright, & Ma, 1993; Weerawarana et al., 1996), linear algebra (Demmel et al., 2005) and linear systems (Bhowmick et al., 2006; Kuefler & Chen, 2008), to the selection of algorithms and data structures in software design (Smith & Setliff, 1992; Cahill, 1994; Brewer, 1995; Wilson et al., 2000). A very common application domain are combinatorial search problems such as SAT (Xu et al., 2008; Lagoudakis & Littman, 2001; Silverthorn & Miikkulainen, 2010), constraints (Minton, 1996; Epstein et al., 2002; O’Mahony et al., 2008), Mixed Integer Programming (Xu, Hutter, Hoos, & Leyton-Brown, 2011), Quantified Boolean Formulae (Pulina & Tacchella, 2009; Stern et al., 2010), planning (Carbonell et al., 1991; Howe et al., 1999; Vrakas et al., 2003), scheduling (Beck & Fox, 2000; Beck & Freuder, 2004; Cicirello & Smith, 2005), combinatorial auctions (Leyton-B... |

2 |
Low Knowledge Algorithm Control for Constraint-Based Scheduling.
- Carchrae
- 2009
(Show Context)
Citation Context .... Heuristics are not necessarily complete or deterministic, i.e. they are not guaranteed to find a solution if it exists or to always make the same decision under the same circumstances. The nature of heuristics makes them particularly amenable to Algorithm Selection – choosing a heuristic manually is difficult even for experts, but choosing the correct one can improve performance significantly. Several doctoral dissertations with related work chapters that survey the literature on Algorithm Selection have been produced. Examples of the more recent ones include Streeter (2007), Hutter (2009), Carchrae (2009), Gagliolo (2010), Ewald (2010), Kotthoff (2012b), Malitsky (2012). SmithMiles (2008a) presents a survey with similar aims. It looks at the Algorithm Selection Problem from the Machine Learning point of view and focuses on seeing Algorithm Selection as a learning problem. As a consequence, great detail is given for aspects that are relevant to Machine Learning. In this paper, we take a more practical point of view and focus on techniques that facilitate and implement Algorithm Selection systems. We are furthermore able to take more recent work in this fast-moving area into account. In contrast... |

2 |
Selecting simulation algorithm portfolios by genetic algorithms.
- Ewald, Schulz, et al.
- 2010
(Show Context)
Citation Context ...005), combinatorial auctions (Leyton-Brown et al., 2002; Gebruers et al., 2004; Gagliolo & Schmidhuber, 2006b), Answer Set Programming (Gebser, Kaminski, Kaufmann, Schaub, Schneider, & Ziller, 2011), the Travelling Salesperson Problem (Fukunaga, 2000) and general search algorithms (Langley, 1983b; Cook & Varnell, 1997; Lobjois & Lemaıtre, 1998). Other domains include Machine Learning (Soares et al., 2004; Leite et al., 2010), the most probable explanation problem (Guo & Hsu, 2004), parallel reduction algorithms Yu et al. (2004), Yu and Rauchwerger (2006) and simulation (Wang & Tropper, 2007; Ewald et al., 2010). It should be noted that a significant part of Machine Learning research is concerned with developing Algorithm Selection techniques; the publications listed in this paragraph are the most relevant that use the specific techniques and framework surveyed here. Some publications consider more than one application domain. Stern et al. (2010) choose the best algorithm for Quantified Boolean Formulae and combinatorial auctions. Allen and Minton (1996), Kroer and Malitsky (2011) look at SAT and constraints. Gomes and Selman (2001) consider SAT and Mixed Integer Programming. In addition to these two... |

2 |
Genetic algorithm portfolios.
- Fukunaga
- 2000
(Show Context)
Citation Context ...survey a schedule with the aim of improving the average-case performance. In later work, they compute theoretical guarantees for the performance of their schedule (Streeter & Smith, 2008). Wu and van Beek (2007) approach scheduling the chosen algorithms in a different way and assume a fixed limit on the amount of resources an algorithm can consume while solving a problem. All algorithms are run sequentially for this fixed amount of time. Similar to Gerevini et al. (2009), they simulate the performance of different allocations and select the best one based on the results of these simulations. (Fukunaga, 2000) estimates the performance of candidate allocations through bootstrap sampling. Gomes and Selman (1997a, 2001) also evaluate the performance of different candidate portfolios, but take into account how many algorithms can be run in parallel. They demonstrate that the optimal schedule (in this case the number of algorithms that are being run) changes as the number of available processors increases. Gagliolo and Schmidhuber (2008) investigate how to allocate resources to algorithms in the presence of multiple CPUs that allow to run more than one algorithm in parallel. Yun and Epstein (2012) craf... |

2 |
Algorithm Selection for Search: A survey
- Fukunaga
- 2008
(Show Context)
Citation Context ...thm was chosen that is only slightly worse than the best one, it has less impact than choosing an algorithm that is orders of magnitude worse. Using the penalty during training is a way of guiding the learned model towards the problems where the potential performance improvement is large. There are many different approaches to how portfolio selectors operate. The selector is not necessarily an explicit part of the system. Minton (1996) compiles the Algorithm Selection system into a Lisp programme for solving the original constraint problem. The selection rules are part of the programme logic. Fukunaga (2008), Garrido and Riff (2010) evolve selectors and combinators of heuristic building blocks using genetic algorithms. The selector is implicit in the evolved programme. 4.1 Performance models The way the selector operates is closely linked to the way the performance model of the algorithms in the portfolio is built. In early approaches, the performance model was usually not learned but given in the form of human expert knowledge. Borrett et al. (1996), Sakkout et al. (1996) use handcrafted rules to determine whether to switch the algorithm during solving. Allen and Minton (1996) also have hand-cra... |

2 |
Using active testing and Meta-Level information for selection of classification algorithms.
- Leite, Brazdil, et al.
- 2010
(Show Context)
Citation Context ... adapted dynamically. Tolpin and Shimony (2011) use the value of information of selecting an algorithm, defined as the amount of time saved by making this choice. Xu, Hutter, Hoos, and Leyton-Brown (2009) predict the penalized average runtime score, a measure that combines runtime with possible timeouts. This approach aims to provide more realistic performance predictions when runtimes are capped. More complex predictions can be made, too. In most cases, these are made by combining simple predictions such as the runtime performance. Brazdil and Soares (2000), Soares, Brazdil, and Kuba (2004), Leite et al. (2010) produce rankings of the portfolio algorithms. Kotthoff et al. (2012) use statistical relational learning to directly predict the ranking instead of deriving it from other predictions. Howe et al. (1999), Gagliolo et al. (2004), Gagliolo and Schmidhuber (2006b), Roberts 20 Algorithm Selection for Search: A survey and Howe (2006), O’Mahony et al. (2008) predict resource allocations for the algorithms in the portfolios. Gebruers et al. (2005), Little, Gebruers, Bridge, and Freuder (2002), Borrett and Tsang (2001) consider selecting the most appropriate formulation of a constraint problem. Smith ... |