## Algorithm portfolio selection as a bandit problem with unbounded losses (2011)

### BibTeX

@MISC{Gagliolo11algorithmportfolio,

author = {Matteo Gagliolo and Jürgen Schmidhuber},

title = {Algorithm portfolio selection as a bandit problem with unbounded losses},

year = {2011}

}

### OpenURL

### Abstract

### Citations

1198 |
Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman
- 1994
(Show Context)
Citation Context ...ally spaced time values, and the last allocation is kept indefinitely, resulting in a total of b time intervals. The problem of selecting the schedule is formulated as a Markov Decision Process (MDP, =-=[39]-=-), and a variation of dynamic programming is used to select the per-set optimal schedule. Both resource sharing and task switching, among deterministic algorithms, are considered in [42], where the ti... |

669 | The weighted majority algorithm
- LITTLESTONE, WARMUTH
- 1994
(Show Context)
Citation Context ...on about the losses, and can therefore be applied to an arbitrary game, be it stationary, nonstationary, or adversarial. Exp3Light [6, Sec. 4] is a modified version of the weighted majority algorithm =-=[31]-=-, in which the cumulative losses for each arm are obtained through an unbiased estimate. 8 The game is subdivided in a sequence of epochs r = 0, 1,...: in each epoch, the probability distribution over... |

311 | The nonstochastic multiarmed bandit problem
- Auer, Cesa-Bianchi, et al.
(Show Context)
Citation Context ...dividing the problem set into a training and a test set, we iteratively update the model each time an instance is solved, and use it to guide algorithm selection on the next instance. Bandit problems =-=[2]-=- offer a solid theoretical framework for dealing with exploration-exploitation trade-offs in the online setting. One important obstacle to the straightforward application of a bandit problem solver to... |

268 | Some aspects of the sequential design of experiments
- Robbins
- 1952
(Show Context)
Citation Context ...antage of (10) over our allocators is that it can be evaluated in a time O(N), as it requires N line searches to find the optimal τ for each algorithm. 4 Online time allocation In its most basic form =-=[41]-=-, the multi-armed bandit problem is faced by a gambler, playing a sequence of trials against an K-armed slot machine. At each trial, the gambler chooses one of the available arms, whose losses are ran... |

238 |
Stochastic Local Search: Foundations and Applications
- Hoos, Stützle
- 2004
(Show Context)
Citation Context ...s Vegas algorithms (LVA) [3], i.e., algorithms whose performance on a single instance coincides with their runtime, which is in general a random variable. More precisely, we consider generalized LVAs =-=[23]-=-, which are not guaranteed to solve a given instance in a finite time. This class includes all solvers for decision or search problems, where the aim is to find a solution, or prove that none exist; b... |

153 |
Survival Analysis: Techniques for Censored and Truncated Data
- Klein, Moeschberger
- 2010
(Show Context)
Citation Context ...t the task of estimating such distributions, finding a vast amount of useful research in the field of survival analysis, a branch of statistics which studies the distribution of random events in time =-=[28]-=-. Such estimation can be carried out using regression models, conditioned on features of the instance [13, 14, 17]. This is analogous to the scalar regression of expected runtime in single algorithm s... |

147 | Heavy-tailed phenomena in satisfiability and constraint satisfaction problems
- Gomes, Selman, et al.
(Show Context)
Citation Context ...y difficult when dealing with algorithm runtimes, which can easily exhibit variations of several order of magnitudes among different problem instances, or even among different runs on a same instance =-=[22]-=-. Some interesting results regarding games with unbounded losses have recently been obtained. In [6, 7], the authors consider a full information game, and provide two algorithms which can adapt to unk... |

126 |
Algorithm portfolios
- Gomes, Selman
- 2001
(Show Context)
Citation Context ...on of a single algorithm is not the most general way of exploiting the performance diversity of a set of LVAs: different algorithms can be combined running them in parallel, in an algorithm portfolio =-=[21, 25]-=-. Moreover, if the algorithms are randomized, their performance may vary among different runs: in some cases, even the performance of a single algorithm may be improved combining different runs, in a ... |

117 | Optimal speedup of Las Vegas algorithms
- Luby, Sinclair, et al.
- 1993
(Show Context)
Citation Context ... the performance of a single algorithm may be improved combining different runs, in a portfolio, or periodically restarting the algorithm with a different random seed, according to a restart strategy =-=[32]-=-. In this line of research, the allocation is based on the runtime distribution (RTD) of each algorithm on the current instance, assumed to be available. The RTD of the resulting portfolio is evaluate... |

111 |
An economics approach to hard computational problems
- Huberman, Lukose, et al.
- 1997
(Show Context)
Citation Context ...ror” approach is still popular, attempts to automate algorithm selection are not new [40], and have grown to form a consistent and dynamic area of research on Meta-Learning [43]. Algorithm portfolios =-=[25]-=- are a generalization of single algorithm selection, in which computation time is shared among the available algorithms: in this case, the result of the selection process is a schedule [42], according... |

91 | Satzilla: Portfolio-based algorithm selection for sat
- Xu, Hutter, et al.
(Show Context)
Citation Context ...se predictions, the algorithm expected to obtain the best performance is selected, and used to solve the instance. For LVAs, an implementation of this idea was proposed in [30], and later improved in =-=[35, 49]-=-. In this case, the runtime is predicted, and minimized. For each available algorithm, an empirical hardness model [30] is learned offline, based on the runtimes on several training problem instances:... |

86 |
The algorithm selection problem
- Rice
- 1976
(Show Context)
Citation Context ...ticular problem instance being solved: in other words, there is no single “best” algorithm. While the “trial and error” approach is still popular, attempts to automate algorithm selection are not new =-=[40]-=-, and have grown to form a consistent and dynamic area of research on Meta-Learning [43]. Algorithm portfolios [25] are a generalization of single algorithm selection, in which computation time is sha... |

59 | Learning the empirical hardness of optimization problems: The case of combinatorial auctions
- Leyton-Brown, Nudelman, et al.
- 2002
(Show Context)
Citation Context ...m is predicted. Based on these predictions, the algorithm expected to obtain the best performance is selected, and used to solve the instance. For LVAs, an implementation of this idea was proposed in =-=[30]-=-, and later improved in [35, 49]. In this case, the runtime is predicted, and minimized. For each available algorithm, an empirical hardness model [30] is learned offline, based on the runtimes on sev... |

44 | Improved secondorder bounds for prediction with expert advice
- CESA-BIANCHI, MANSOUR, et al.
- 2007
(Show Context)
Citation Context ...sily vary across several orders of magnitude. In [17, 18]wedealtwiththisissue heuristically, fixing the bound in advance. In this paper, we present a modification of an existing bandit problem solver =-=[6]-=-, which allows it to deal with an unknown bound on losses, while retaining a bound on the expected regret. This allows us to propose a simpler version of the algorithm selection framework GambleTA, or... |

44 | Understanding random sat: Beyond the clauses-to-variables ratio
- Nudelman, Leyton-Brown, et al.
- 2004
(Show Context)
Citation Context ...se predictions, the algorithm expected to obtain the best performance is selected, and used to solve the instance. For LVAs, an implementation of this idea was proposed in [30], and later improved in =-=[35, 49]-=-. In this case, the runtime is predicted, and minimized. For each available algorithm, an empirical hardness model [30] is learned offline, based on the runtimes on several training problem instances:... |

39 | Minimizing regret with label efficient prediction
- Cesa-bianchi, Lugosi, et al.
- 2005
(Show Context)
Citation Context ...o solve other instances. The decision of whether to “explore” algorithm performance, or “exploit” the greedy schedule, is taken independently for each instance, using a label ef f icient forecaster 5 =-=[5]-=-, which allows to bound the regret compared to the offline greedy strategy, and whose complexity is also exponential in N. A different online method is proposed in [46], where a per set optimal task s... |

33 | The design and analysis of an algorithm portfolio for sat - Xu, Hutter, et al. - 2007 |

27 | Combining multiple heuristics online
- Streeter, Golovin, et al.
- 2007
(Show Context)
Citation Context .... Section 2 precises some terminology used in the rest of the paper, and discusses related work. Section 3 briefly recalls the time allocators introduced in [17], and describes a more recent one from =-=[47]-=-. Section 4 presents the novel version of GambleTA.Section5 introduces Exp3LightA, a bandit problem solver for unbounded loss games, along with its bound on regret. Experiments with data from solver c... |

26 |
Cross-disciplinary perspectives on metalearning for algorithm selection
- Smith-Miles
- 2009
(Show Context)
Citation Context ...y on a runtime sample, proving that finding an optimal allocation is itself an NP-hard problem. The runtime data is collected both offline and online (Section 2.3). Further references can be found in =-=[13, 44]-=-. 2.1 Model based selection, per instance Per instance selection is usually based on a predictive model of the performance of each algorithm, conditioned on features of the instance, a set of numerica... |

24 | Dynamic Algorithm Portfolios
- Gagliolo, Schmidhuber
- 2006
(Show Context)
Citation Context ...blem combinations, based on the model’s predictions. This trade-off is typically ignored in offline algorithm selection, and the size of the training set is chosen heuristically. In our previous work =-=[11, 16, 17]-=-, we have instead kept an online view of algorithm selection, in which the only input available to the meta-learner is a set of algorithms, of unknown performance, and a sequence of problem instances ... |

16 |
F.: Reactive Search and Intelligent Optimization
- Battiti, Brunato, et al.
- 2008
(Show Context)
Citation Context ...r, method or technique to refer to the upper-level TA which uses the algorithms in the set. 2 Examples of time allocation which we do not consider here include continuous parameter tuning and control =-=[4]-=-; and sequential composition of algorithms, as search in program space [12], and anytime algorithm scheduling [24].Algorithm portfolio selection as a bandit problem 53 on this sample, a predictive mo... |

15 | Monte-carlo algorithms in graph isomorphism testing
- BABAI
- 1979
(Show Context)
Citation Context ...s joint probability can be evaluated as the product of the individual survival functions Sn(snt): SA(t; s) = which, in CDF form, corresponds to FA(t; s) = 1 − N∏ Sn (snt) , (2) n=1 N∏ [1 − Fn(snt)] . =-=(3)-=- n=1 Note that the assumption of independence of the Tn, which allows to express (2) as a product, is justified by the fact that the an do not interact, and by the use of the actual RTDs of the an on ... |

13 | Learning restart strategies
- M, Schmidhuber
- 2007
(Show Context)
Citation Context ...red adversarial. As BPS typically minimize the regret with respect to a single arm, this approach would allow to implement per set selection, of the overall best algorithm. An example can be found in =-=[18]-=-, where we presented an online method for learning a per set estimate of an optimal restart strategy. Unfortunately, per set selection is only profitable if one of the algorithms dominates the others ... |

12 | Y.: Parameter adjustment based on performance prediction: Towards an instance aware problem solver
- Hutter, Hamadi
- 2005
(Show Context)
Citation Context ...algorithm (model selection). Time allocation can be performed once for a whole set of problem instances, or repeated independently for each instance (per set and per instance allocation, respectively =-=[26]-=-). Another independent classification can be made among static and dynamic schedules [36]. Static schedules are stationary, and are set before starting any an. Dynamic schedules can be a function of t... |

11 |
Computational Tradeoffs under Bounded Resources
- Horvitz, Zilberstein
(Show Context)
Citation Context ...location which we do not consider here include continuous parameter tuning and control [4]; and sequential composition of algorithms, as search in program space [12], and anytime algorithm scheduling =-=[24]-=-.Algorithm portfolio selection as a bandit problem 53 on this sample, a predictive model of performance is learned, mapping (instance, algorithm) pairs to the expected performance: in practice, this ... |

10 |
Y.: Combining multiple heuristics
- Sayag, Fine, et al.
- 2006
(Show Context)
Citation Context ...portfolios [25] are a generalization of single algorithm selection, in which computation time is shared among the available algorithms: in this case, the result of the selection process is a schedule =-=[42]-=-, according to which the algorithms are executed. The objective of selection depends on the application at hand. In this paper, we consider the problem of allocating computation time to a set of Las V... |

9 | New techniques for algorithm portfolio design
- Streeter, Smith
- 2008
(Show Context)
Citation Context ...ays coincides with the number of instances. Winner is the a-priori unknown best algorithm in the competition. Uniform is the portfolio of all competing algorithms, sharing resources equally. OnG-Exp3 =-=[46]-=- is another online portfolio approach, starting from scratch as GambleTA. See Figs. 1–11 for detailed results, including average runtimes68 M. Gagliolo, J. Schmidhuber 10 4 Runtime of Winner, on each... |

8 | Learning Parallel Portfolios of Algorithms
- Petrik
- 2005
(Show Context)
Citation Context ...blem instances, or repeated independently for each instance (per set and per instance allocation, respectively [26]). Another independent classification can be made among static and dynamic schedules =-=[36]-=-. Static schedules are stationary, and are set before starting any an. Dynamic schedules can be a function of time s = s(t), i.e., they can change while the algorithms are being executed. Rather than ... |

7 | Optimal schedules for parallelizing anytime algorithms: The case of shared resources
- Finkelstein, Markovitch, et al.
- 2003
(Show Context)
Citation Context ...efer to a set of N algorithms A ={a1,...,aN}; and to a set of M problem instances B ={b 1,...,bM}. To solve these, the algorithms are executed on one or more processors, according to a schedule s ∈ S =-=[8]-=-.TheroleoftheTA is precisely that of generating 1 a schedule s, which can be used to solve B using A. Existing TAs may differ in the way the schedule s is represented, as well as in its allowed values... |

7 |
G.: Longitudinal Data Analysis
- Fitzmaurice, Davidian, et al.
- 2008
(Show Context)
Citation Context ...ng the solution quality distribution (SQD) for an arbitrary runtime value [23]. In statistical terminology, this is an example of longitudinal data, which can be described using mixed ef fects models =-=[10]-=-. In [15] we presented preliminary experiments, showing that nonlinear mixed-effects models can be used to predict the performance of optimization algorithms. The issue with such models is theirAlgor... |

6 | Hannan consistency in on-line learning in case of unbounded losses under partial monitoring
- Allenberg, Auer, et al.
- 2006
(Show Context)
Citation Context ...ounded losses have recently been obtained. In [6, 7], the authors consider a full information game, and provide two algorithms which can adapt to unknown bounds on signed rewards. Based on this work, =-=[1]-=- provide a Hannan consistent algorithm for losses whose bound grows in the number of trials i with a known rate i ν , ν<1/2. This latter hypothesis does not fit well our situation, as we would like to... |

6 | A neural network model for inter-problem adaptive online time allocation
- Gagliolo, Schmidhuber
- 2005
(Show Context)
Citation Context ...blem combinations, based on the model’s predictions. This trade-off is typically ignored in offline algorithm selection, and the size of the training set is chosen heuristically. In our previous work =-=[11, 16, 17]-=-, we have instead kept an online view of algorithm selection, in which the only input available to the meta-learner is a set of algorithms, of unknown performance, and a sequence of problem instances ... |

5 | Adaptive online time allocation to search algorithms
- Gagliolo, Zhumatiy, et al.
- 2004
(Show Context)
Citation Context ...blem combinations, based on the model’s predictions. This trade-off is typically ignored in offline algorithm selection, and the size of the training set is chosen heuristically. In our previous work =-=[11, 16, 17]-=-, we have instead kept an online view of algorithm selection, in which the only input available to the meta-learner is a set of algorithms, of unknown performance, and a sequence of problem instances ... |

5 | Algorithm selection as a bandit problem with unbounded losses
- Gagliolo, Schmidhuber
(Show Context)
Citation Context ...heuristic reward attribution instead of using the plain runtimes. GambleTA4 is therefore less sound than GambleTA: however, compared on a toy problem, the two versions displayed a similar performance =-=[20]-=-. 5 Unbounded losses A common issue of the above approaches is the difficulty of setting reasonable upper bounds on the time required by the algorithms. This renders a straightforward application of m... |

5 |
Nonparametric estimation from incomplete samples
- Kaplan, Meyer
- 1958
(Show Context)
Citation Context ...directories were used, we will compare instead with the per set version of their method, presenting experiments where no features are used: to model the RTD, we simply used the Kaplan-Meier estimator =-=[27]-=-. For TAEt, TAQ, andTAC, the RTD of the portfolio was evaluated as in (2), based on the survival functions of each algorithm, estimated by the models. The Table 1 Summary of results: instances solved ... |

4 |
Faster learning through a probabilistic approximation algorithm
- Kolen
- 1988
(Show Context)
Citation Context ...time [9]. The problem of estimating the RTDs is not tackled, so this line of work remains at a theoretical level. While the basic idea can be traced back to the field of global optimization (see e.g. =-=[29]-=-, [34]), it is first applied to search algorithms by [25], who borrow the term portfolio from finance, to point out that the method can reduce the “risk” of wasting computation time. A theoretical and... |

3 |
Statistically Optimal Combination of Algorithms. Presented at SOFSEM
- Petrik
- 2005
(Show Context)
Citation Context ...33], under the assumption that the distribution generating instances remains the same. The problem of optimizing the share is found to be NP-hard. Dynamic resource sharing schedules are considered in =-=[36, 37]-=-. The schedules can change at a finite set of equally spaced time values, and the last allocation is kept indefinitely, resulting in a total of b time intervals. The problem of selecting the schedule ... |

3 | Using Online Algorithms to Solve NP-hard Problems more Efficiently in Practice
- Streeter
- 2007
(Show Context)
Citation Context ...ces Exp3LightA, a bandit problem solver for unbounded loss games, along with its bound on regret. Experiments with data from solver competitions are reported in Section 6, comparing with results from =-=[45]-=-. Section 7 concludes the paper. 2 Related work Before presenting related research, it will be useful to precise the meaning of a few key concepts: whenever possible, the most widely used term has bee... |

2 | Towards distributed algorithm portfolios
- Gagliolo, Schmidhuber
- 2008
(Show Context)
Citation Context ...ptimization process multiple times, with different random initial values for s. Examples of these surfaces for N = 2 algorithms are reported in [13, 17]. The allocation of multiple CPUs is derived in =-=[19]-=-. In the literature, the instance RTDs are assumed to be available a priori. Aiming at a practical implementation of our methods, we looked at the task of estimating such distributions, finding a vast... |

1 |
Universal search
- Gagliolo
- 2007
(Show Context)
Citation Context ...ithms in the set. 2 Examples of time allocation which we do not consider here include continuous parameter tuning and control [4]; and sequential composition of algorithms, as search in program space =-=[12]-=-, and anytime algorithm scheduling [24].Algorithm portfolio selection as a bandit problem 53 on this sample, a predictive model of performance is learned, mapping (instance, algorithm) pairs to the e... |

1 | Online dynamic algorithm portfolios
- Gagliolo
- 2010
(Show Context)
Citation Context ...y on a runtime sample, proving that finding an optimal allocation is itself an NP-hard problem. The runtime data is collected both offline and online (Section 2.3). Further references can be found in =-=[13, 44]-=-. 2.1 Model based selection, per instance Per instance selection is usually based on a predictive model of the performance of each algorithm, conditioned on features of the instance, a set of numerica... |

1 | C.: Algorithm survival analysis
- Gagliolo, Legrand
- 2010
(Show Context)
Citation Context ... expected value of the regret of Exp3Light-A (K, M) is bounded as: E{LE(M)}−L ∗ √ (M) ≤ 4 3⌈log2 L⌉L(log K + K log M)KL∗ (M) [√ + 2⌈log2 L⌉L 4L(log K + K log M)K ] + (2K + 1)(1 + log4 (3M + 1)) + 2 . =-=(14)-=- The proof is given in the Appendix. The regret obtained by Exp3Light-A is O(K( √ LL ∗ (M) log L log M + L log L( √ L log M + log M))): comparing with the regret of the original Exp3Light with a known... |

1 | M.: Mixed-effects modeling of optimisation algorithm performance
- Gagliolo, Legrand, et al.
- 2009
(Show Context)
Citation Context ...lution quality distribution (SQD) for an arbitrary runtime value [23]. In statistical terminology, this is an example of longitudinal data, which can be described using mixed ef fects models [10]. In =-=[15]-=- we presented preliminary experiments, showing that nonlinear mixed-effects models can be used to predict the performance of optimization algorithms. The issue with such models is theirAlgorithm port... |

1 | M.: Parallel trials versus single search in supervised learning
- Muselli, Rabbia
- 1991
(Show Context)
Citation Context ...9]. The problem of estimating the RTDs is not tackled, so this line of work remains at a theoretical level. While the basic idea can be traced back to the field of global optimization (see e.g. [29], =-=[34]-=-), it is first applied to search algorithms by [25], who borrow the term portfolio from finance, to point out that the method can reduce the “risk” of wasting computation time. A theoretical and empir... |