Results 1 
8 of
8
Fast Image Tagging
"... Automatic image annotation is a difficult and highly relevant machine learning task. Recent advances have significantly improved the stateoftheart in retrieval accuracy with algorithms based on nearest neighbor classification in carefully learned metric spaces. But this comes at a price of increa ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Automatic image annotation is a difficult and highly relevant machine learning task. Recent advances have significantly improved the stateoftheart in retrieval accuracy with algorithms based on nearest neighbor classification in carefully learned metric spaces. But this comes at a price of increased computational complexity during training and testing. We propose FastTag, a novel algorithm that achieves comparable results with two simple linear mappings that are coregularized in a joint convex loss function. The loss function can be efficiently optimized in closed form updates, which allows us to incorporate a large number of image descriptors cheaply. On several standard realworld benchmark data sets, we demonstrate that FastTag matches the current stateoftheart in tagging quality, yet reduces the training and testing times by several orders of magnitude and has lower asymptotic complexity. 1.
The dropout learning algorithm
 Artificial int lligence
"... Dropout is a recently introduced algorithm for training neural networks by randomly dropping units during training to prevent their coadaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommoda ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
Dropout is a recently introduced algorithm for training neural networks by randomly dropping units during training to prevent their coadaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the nonlinear case. The ensemble averaging properties of dropout in nonlinear logistic networks result from three fundamental equations: (1) the approximation of the expectations of logistic functions by normalized geometric means, for which bounds and estimates are derived; (2) the algebraic equality between normalized geometric means of logistic functions with the logistic of the means, which mathematically characterizes logistic functions; and (3) the linearity of the means with respect to sums, as well as products of independent variables. The results are also extended to other classes of transfer functions, including rectified linear functions. Approximation errors tend to cancel each other and do not accumulate. Dropout can also be connected to stochastic neurons and used to predict firing rates, and to backpropagation by viewing the backward propagation as ensemble averaging in a dropout linear network. Moreover, the convergence properties of dropout can be understood in terms of stochastic gradient descent. Finally, for the regularization properties of dropout, the expectation of the dropout gradient is the gradient of the corresponding approximation ensemble, regularized by an adaptive weight decay term with a propensity for selfconsistent variance minimization and sparse representations.
Transformation pursuit for image classification
 In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2014
"... A simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equall ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
A simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equally informative and adding uninformative transformations increases training time with no gain in accuracy. We propose a principled algorithm – Image Transformation Pursuit (ITP) – for the automatic selection of a compact set of transformations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transformations, that combine basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. Using Fisher Vector representations, we achieve an improvement from 28.2 % to 45.2 % in top1 accuracy on CUB, and an improvement from 70.1 % to 74.9 % in top5 accuracy on ImageNet. We also show significant improvements for deep convnet features: from 47.3 % to 55.4 % on CUB and from 77.9 % to 81.4 % on ImageNet. 1.
Efficient GradientBased Inference through Transformations between Bayes Nets and Neural Nets
"... Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable noncentered p ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Hierarchical Bayesian networks and neural networks with stochastic hidden units are commonly perceived as two separate types of models. We show that either of these types of models can often be transformed into an instance of the other, by switching between centered and differentiable noncentered parameterizations of the latent variables. The choice of parameterization greatly influences the efficiency of gradientbased posterior inference; we show that they are often complementary to eachother, we clarify when each parameterization is preferred and show how inference can be made robust. In the noncentered form, a simple Monte Carlo estimator of the marginal likelihood can be used for learning the parameters. Theoretical results are supported by experiments. 1.
Model Regularization for Stable Sample Rollouts
"... When an imperfect model is used to generate sample rollouts, its errors tend to compound – a flawed sample is given as input to the model, which causes more errors, and so on. This presents a barrier to applying rolloutbased planning algorithms to learned models. To address this issue, a training ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
When an imperfect model is used to generate sample rollouts, its errors tend to compound – a flawed sample is given as input to the model, which causes more errors, and so on. This presents a barrier to applying rolloutbased planning algorithms to learned models. To address this issue, a training methodology called “hallucinated replay ” is introduced, which adds samples from the model into the training data, thereby training the model to produce sensible predictions when its own samples are given as input. Capabilities and limitations of this approach are studied empirically. In several examples hallucinated replay allows effective planning with imperfect models while models trained using only real experience fail dramatically. 1
Iterative Splits of Quadratic Bounds for Scalable Binary Tensor Factorization
"... Binary matrices and tensors are popular data structures that need to be efficiently approximated by lowrank representations. A standard approach is to minimize the logistic loss, well suited for binary data. In many cases, the number m of nonzero elements in the tensor is much smaller than the t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Binary matrices and tensors are popular data structures that need to be efficiently approximated by lowrank representations. A standard approach is to minimize the logistic loss, well suited for binary data. In many cases, the number m of nonzero elements in the tensor is much smaller than the total number n of possible entries in the tensor. This creates a problem for large tensors because the computation of the logistic loss has a linear time complexity with n. In this work, we show that an alternative approach is to minimize the quadratic loss (root mean square error) which leads to algorithms with a training time complexity that is reduced from O(n) to O(m), as proposed earlier in the restricted case of alternating leastsquare algorithms. In addition, we propose and study a greedy algorithm that partitions the tensor into smaller tensors, each approximated by a quadratic upper bound. This technique provides a timeaccuracy tradeoff between a fast but approximate algorithm and an accurate but slow algorithm. We show that this technique leads to a considerable speedup in learning of real world tensors. 1
Offline Evaluation of Response Prediction in Online Advertising Auctions
"... Clickthrough rates and conversion rates are two core machine learning problems in online advertising. The evaluation of such systems is often based on traditional supervised learning metrics that ignore how the predictions are used. These predictions are in fact part of bidding systems in online ..."
Abstract
 Add to MetaCart
(Show Context)
Clickthrough rates and conversion rates are two core machine learning problems in online advertising. The evaluation of such systems is often based on traditional supervised learning metrics that ignore how the predictions are used. These predictions are in fact part of bidding systems in online advertising auctions. We present here an empirical evaluation of a metric that is specifically tailored for auctions in online advertising and show that it correlates better than standard metrics with A/B test results.
Weakly Supervised Clustering: Learning FineGrained Signals from Coarse Side Information
, 2013
"... Consider a classification problem where we do not have access to labels for individual training examples, but only have average labels over subpopulations. We give practical examples of this setup, and show how these classification tasks can usefully be analyzed as weakly supervised clustering prob ..."
Abstract
 Add to MetaCart
(Show Context)
Consider a classification problem where we do not have access to labels for individual training examples, but only have average labels over subpopulations. We give practical examples of this setup, and show how these classification tasks can usefully be analyzed as weakly supervised clustering problems. We propose three approaches to solving the weakly supervised clustering problem, including a latent variables model that performs well in our experiments. We illustrate our methods on an industry dataset that was the original motivation for this research. 1