Results 1  10
of
16
Stochastic Variational Inference
 JOURNAL OF MACHINE LEARNING RESEARCH (2013, IN PRESS)
, 2013
"... We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet proce ..."
Abstract

Cited by 131 (27 self)
 Add to MetaCart
We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.
Stochastic composite likelihood
 Machine Learning
, 1997
"... Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computationaccuracy tradeoff di ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computationaccuracy tradeoff differently, and taken together they span a continuous spectrum of computationaccuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy.
Statistical tests for optimization efficiency
, 2010
"... Learning problems, such as logistic regression, are typically formulated as pure optimization problems defined on some loss function. We argue that this view ignores the fact that the loss function depends on stochastically generated data which in turn determines an intrinsic scale of precision for ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Learning problems, such as logistic regression, are typically formulated as pure optimization problems defined on some loss function. We argue that this view ignores the fact that the loss function depends on stochastically generated data which in turn determines an intrinsic scale of precision for statistical estimation. By considering the statistical properties of the update variables used during the optimization (e.g. gradients), we can construct frequentist hypothesis tests to determine the reliability of these updates. We utilize subsets of the data for computing updates, and use the hypothesis tests for determining when the batchsize needs to be increased. This provides computational benefits and avoids overfitting by stopping when the batchsize has become equal to size of the full dataset. Moreover, the proposed algorithms depend on a single interpretable parameter – the probability for an update to be in the wrong direction – which is set to a single value across all algorithms and datasets. In this paper, we illustrate these ideas on three L1 regularized coordinate descent algorithms: L1regularized L2loss SVMs, L1regularized logistic regression, and the Lasso, but we emphasize that the underlying methods are much more generally applicable. 1
Unsupervised Subcategorization for Object Detection: Finding Cars from a Driving Vehicle
"... We present a novel algorithm for unsupervised subcategorization of an object class, in the context of object detection. Dividing the detection problem into smaller subproblems simplifies the object vs. background classification. The algorithm uses an iterative splitandmerge procedure and uses both ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
We present a novel algorithm for unsupervised subcategorization of an object class, in the context of object detection. Dividing the detection problem into smaller subproblems simplifies the object vs. background classification. The algorithm uses an iterative splitandmerge procedure and uses both object and nonobject information. Subclasses are automatically split into two new classes if the visual variation is too large, while two classes are merged if they are visually similar. After each iteration, samples are relabeled to the most preferred subclasses. In contrast to existing literature on unsupervised subcategorization, our approach does not fix the number of final subclasses and determines this number using a visual similarity measure. Because we use a fast stochastic learning algorithm, full retraining and relabeling can be applied at each iteration. We show that the algorithm significantly outperforms stateoftheart multiclass algorithms for a car detection problem using standard HOG features and simple linear classification, while significantly decreasing training time to a few minutes. Additionally, for our car detection problem, the identified subclasses by the algorithm were semantically meaningful and reveal the viewpoint of the object without the use of any motion information. 1.
Statistical Optimization of NonNegative Matrix Factorization
"... NonNegative Matrix Factorization (NMF) is a dimensionality reduction method that has been shown to be very useful for a variety of tasks in machine learning and data mining. One of the fastest algorithms for NMF is the Block Principal Pivoting method (BPP) of (Kim & Park, 2008b), which follows ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
NonNegative Matrix Factorization (NMF) is a dimensionality reduction method that has been shown to be very useful for a variety of tasks in machine learning and data mining. One of the fastest algorithms for NMF is the Block Principal Pivoting method (BPP) of (Kim & Park, 2008b), which follows a block coordinate descent approach. The optimization in each iteration involves solving a large number of expensive least squares problems. Taking the view that the design matrix was generated by a stochastic process, and using the asymptotic normality of the least squares estimator, we propose a method for improving the performance of the BPP method. Our method starts with a small subset of the columns and rows of the original matrix and uses frequentist hypothesis tests to adaptively increase the size of the problem. This achieves two objectives: 1) during the initial phase of the algorithm we solve far fewer, much smaller sized least squares problems and 2) all hypothesis tests failing while using all the data represents a principled, automatic stopping criterion. Experiments on three real world datasets show that our algorithm significantly improves the performance of the original BPP algorithm. 1
LIST OF FIGURES LIST OF TABLES ACKNOWLEDGMENTS ABSTRACT OF THE THESIS
, 2010
"... ..."
(Show Context)
2010 International Conference on Pattern Recognition Fast Training of Object Detection using Stochastic Gradient Descent
"... Training datasets for object detection problems are typically very large and Support Vector Machine (SVM) implementations are computationally complex. As opposed to these complex techniques, we use Stochastic Gradient Descent (SGD) algorithms that use only a single new training sample in each iterat ..."
Abstract
 Add to MetaCart
(Show Context)
Training datasets for object detection problems are typically very large and Support Vector Machine (SVM) implementations are computationally complex. As opposed to these complex techniques, we use Stochastic Gradient Descent (SGD) algorithms that use only a single new training sample in each iteration and process samples in a streamlike fashion. We have incorporated SGD optimization in an object detection framework. The object detection problem is typically highly asymmetric, because of the limited variation in object appearance, compared to the background. Incorporating SGD speeds up the optimization process significantly, requiring only a single iteration over the training set to obtain results comparable to stateoftheart SVM techniques. SGD optimization is linearly scalable in time and the obtained speedup in computation time is two to three orders of magnitude. We show that by considering only part of the total training set, SGD converges quickly to the overall optimum. 1.