Results 1  10
of
95
Practical bayesian optimization of machine learning algorithms
, 2012
"... In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at ..."
Abstract

Cited by 112 (14 self)
 Add to MetaCart
(Show Context)
In this section we specify additional details of our Bayesian optimization algorithm which, for brevity, were omitted from the paper. For more detail, the code used in this work is made publicly available at
Random search for hyperparameter optimization
 In: Journal of Machine Learning Research
"... Grid search and manual search are the most widely used strategies for hyperparameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyperparameter optimization than trials on a grid. Empirical evidence comes from a comparison with a ..."
Abstract

Cited by 96 (15 self)
 Add to MetaCart
(Show Context)
Grid search and manual search are the most widely used strategies for hyperparameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyperparameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyperparameters to validation set performance reveals that for most data sets only a few of the hyperparameters really matter, but that different hyperparameters are important on different data sets. This phenomenon makes
Algorithms for hyperparameter optimization
 In NIPS
, 2011
"... Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyperparameter optimization has been the job of humans because they can be very efficient ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
(Show Context)
Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyperparameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyperparameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyperparameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [1] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P (yx) in which many elements of hyperparameter assignment (x) are known to be irrelevant given particular values of other elements. 1
AutoWEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
"... Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previo ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
(Show Context)
Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA’s standard distribution, spanning 2 ensemble methods, 10 metamethods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR10, we show classification performance often much better than using standard selection and hyperparameter optimization methods. We hope that our approach will help nonexpert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.
Practical recommendations for gradientbased training of deep architectures
 Neural Networks: Tricks of the Trade
, 2013
"... ar ..."
(Show Context)
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures
"... Many computer vision algorithms depend on configuration settings that are typically handtuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is freque ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Many computer vision algorithms depend on configuration settings that are typically handtuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to realizing a method’s full potential. Compounding matters, these parameters often must be retuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to quantify or describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned. In this work, we propose a metamodeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace handtuning with a reproducible and unbiased optimizaProceedings of the 30 th
Algorithm runtime prediction: Methods and evaluation
 Artificial Intelligence J
, 2014
"... Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm’s runtime as a function of problemspecific instance features. Such models have important applications to algorithm ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Perhaps surprisingly, it is possible to predict how long an algorithm will take to run on a previously unseen input, using machine learning techniques to build a model of the algorithm’s runtime as a function of problemspecific instance features. Such models have important applications to algorithm analysis, portfoliobased algorithm selection, and the automatic configuration of parameterized algorithms. Over the past decade, a wide variety of techniques have been studied for building such models. Here, we describe extensions and improvements of existing models, new families of models, and— perhaps most importantly—a much more thorough treatment of algorithm parameters as model inputs. We also comprehensively describe new and existing features for predicting algorithm runtime for propositional satisfiability (SAT), travelling salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate these innovations through the largest empirical analysis of its kind, comparing to a wide range of runtime modelling techniques from the literature. Our experiments consider 11 algorithms and 35 instance distributions; they also span a very wide range of SAT, MIP, and TSP instances, with the least structured having been generated uniformly at random and the most structured having emerged from real industrial applications. Overall, we demonstrate that our new models yield substantially better runtime predictions than previous approaches in terms of their generalization to new problem instances, to new algorithms from a parameterized space, and to both simultaneously.
Bayesian Optimization in High Dimensions via Random Embeddings
"... Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve highdimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.
Parallel Algorithm Configuration
"... Abstract. Stateoftheart algorithms for solving hard computational problems often expose many parameters whose settings critically affect empirical performance. Manually exploring the resulting combinatorial space of parameter settings is often tedious and unsatisfactory. Automated approaches for ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
(Show Context)
Abstract. Stateoftheart algorithms for solving hard computational problems often expose many parameters whose settings critically affect empirical performance. Manually exploring the resulting combinatorial space of parameter settings is often tedious and unsatisfactory. Automated approaches for finding good parameter settings are becoming increasingly prominent and have recently lead to substantial improvements in the state of the art for solving a variety of computationally challenging problems. However, running such automated algorithm configuration procedures is typically very costly, involving many thousands of invocations of the algorithm to be configured. Here, we study the extent to which parallel computing can come to the rescue. We compare straightforward parallelization by multiple independent runs with a more sophisticated method of parallelizing the modelbased configuration procedure SMAC. Empirical results for configuring the MIP solver CPLEX demonstrate that nearoptimal speedups can be obtained with up to 16 parallel workers, and that 64 workers can still accomplish challenging configuration tasks that previously took 2 days in 1–2 hours. Overall, we show that our methods make effective use of largescale parallel resources and thus substantially expand the practical applicability of algorithm configuration methods. 1
Input Warping for Bayesian Optimization of NonStationary Functions
"... Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes pr ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of nonstationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in “logspace, ” to mitigate the effects of spatiallyvarying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multitask Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the stateoftheart, producing better results faster and more reliably.