Results 1  10
of
46
Representation Learning: A Review and New Perspectives
, 2012
"... The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to ..."
Abstract

Cited by 152 (4 self)
 Add to MetaCart
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and joint training of deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep architectures. This motivates longerterm unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
AutoWEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms
"... Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previo ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
(Show Context)
Many different machine learning algorithms exist; taking into account each algorithm’s hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA’s standard distribution, spanning 2 ensemble methods, 10 metamethods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR10, we show classification performance often much better than using standard selection and hyperparameter optimization methods. We hope that our approach will help nonexpert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.
Practical recommendations for gradientbased training of deep architectures
 Neural Networks: Tricks of the Trade
, 2013
"... ar ..."
Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures
"... Many computer vision algorithms depend on configuration settings that are typically handtuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is freque ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Many computer vision algorithms depend on configuration settings that are typically handtuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to realizing a method’s full potential. Compounding matters, these parameters often must be retuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to quantify or describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned. In this work, we propose a metamodeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace handtuning with a reproducible and unbiased optimizaProceedings of the 30 th
Bayesian Optimization in High Dimensions via Random Embeddings
"... Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Bayesian optimization techniques have been successfully applied to robotics, planning, sensor placement, recommendation, advertising, intelligent user interfaces and automatic algorithm configuration. Despite these successes, the approach is restricted to problems of moderate dimension, and several workshops on Bayesian optimization have identified its scaling to high dimensions as one of the holy grails of the field. In this paper, we introduce a novel random embedding idea to attack this problem. The resulting Random EMbedding Bayesian Optimization (REMBO) algorithm is very simple and applies to domains with both categorical and continuous variables. The experiments demonstrate that REMBO can effectively solve highdimensional problems, including automatic parameter configuration of a popular mixed integer linear programming solver.
Towards an empirical foundation for assessing Bayesian optimization of hyperparameters
 In NIPS Workshop on Bayesian Optimization in Theory and Practice
, 2013
"... Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyper ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Progress in practical Bayesian optimization is hampered by the fact that the only available standard benchmarks are artificial test functions that are not representative of practical applications. To alleviate this problem, we introduce a library of benchmarks from the prominent application of hyperparameter optimization and use it to compare Spearmint, TPE, and SMAC, three recent Bayesian optimization methods for hyperparameter optimization. 1
On correlation and budget constraints in modelbased bandit optimization
"... with application to automatic machine learning ..."
AutoWEKA: Automated selection and hyperparameter optimization of classification algorithms
, 2012
"... There exists a large variety of machine learning algorithms; as most of these can be configured via hyperparameters, there is a staggeringly large number of possible alternatives overall. There has been a considerable amount of previous work on choosing among learning algorithms and, separately, o ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
There exists a large variety of machine learning algorithms; as most of these can be configured via hyperparameters, there is a staggeringly large number of possible alternatives overall. There has been a considerable amount of previous work on choosing among learning algorithms and, separately, on optimizing hyperparameters (mostly when these are continuous and very few in number) in a given use context. However, we are aware of no work that addresses both problems together. Here, we demonstrate the feasibility of using a fully automated approach for choosing both a learning algorithm and its hyperparameters, leveraging recent innovations in Bayesian optimization. Specifically, we apply this approach to the full range of classifiers implemented in WEKA, spanning 3 ensemble methods, 14 metamethods, 30 base classifiers, and a wide range of hyperparameter settings for each of these. On each of 10 popular data sets from the UCI repository, we show classification performance better than that of complete crossvalidation over the default hyperparameter settings of our 47 classification algorithms. We believe that our approach, which we dubbed AutoWEKA, will enable typical users of machine learning algorithms to make better choices and thus to obtain better performance in a fully automated fashion. 1
LargeScale Optimization of Hierarchical Features for Saliency Prediction in Natural Images
"... Saliency prediction typically relies on handcrafted (multiscale) features that are combined in different ways to form a “master ” saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incr ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
Saliency prediction typically relies on handcrafted (multiscale) features that are combined in different ways to form a “master ” saliency map, which encodes local image conspicuity. Recent improvements to the state of the art on standard benchmarks such as MIT1003 have been achieved mostly by incrementally adding more and more handtuned features (such as car or face detectors) to existing models [18, 4, 22, 34]. In contrast, we here follow an entirely automatic datadriven approach that performs a largescale search for optimal features. We identify those instances of a richlyparameterized bioinspired model family (hierarchical neuromorphic networks) that successfully predict image saliency. Because of the high dimensionality of this parameter space, we use automated hyperparameter optimization to efficiently guide the search. The optimal blend of such multilayer features combined with a simple linear classifier achieves excellent performance on several image saliency benchmarks. Our models outperform the state of the art on MIT1003, on which features and classifiers are learned. Without additional training, these models generalize well to two other image saliency data sets, Toronto and NUSEF, despite their different image content. Finally, our algorithm scores best of all the 23 models evaluated to date on the MIT300 saliency challenge [16], which uses a hidden test set to facilitate an unbiased comparison. 1.
Bayesian MultiScale Optimistic Optimization
"... Bayesian optimization is a powerful global optimization technique for expensive blackbox functions. One of its shortcomings is that it requires auxiliary optimization of an acquisition function at each iteration. This auxiliary optimization can be costly and very hard to carry out in practice. M ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Bayesian optimization is a powerful global optimization technique for expensive blackbox functions. One of its shortcomings is that it requires auxiliary optimization of an acquisition function at each iteration. This auxiliary optimization can be costly and very hard to carry out in practice. Moreover, it creates serious theoretical concerns, as most of the convergence results assume that the exact optimum of the acquisition function can be found. In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. The experiments with global optimization benchmarks and a novel application to automatic information extraction demonstrate that the resulting technique is more efficient than the two approaches from which it draws inspiration. Unlike most theoretical analyses of Bayesian optimization with Gaussian processes, our finitetime convergence rate proofs do not require exact optimization of an acquisition function. That is, our approach eliminates the unsatisfactory assumption that a difficult, potentially NPhard, problem has to be solved in order to obtain vanishing regret rates. 1