Results 1 - 10
of
214
Pegasos: Primal Estimated sub-gradient solver for SVM
"... We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract
-
Cited by 131 (10 self)
- Add to MetaCart
We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require Ω(1/ɛ2) iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is Õ(d/(λɛ)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.
Opportunistic transmission scheduling with resource-sharing constraints in wireless networks
- IEEE Journal on Selected Areas in Communications
, 2001
"... We present an “opportunistic ” transmission scheduling policy that exploits time-varying channel conditions and maxi-mizes the system performance stochastically under a certain resource allocation constraint. We establish the optimality of the scheduling scheme, and also that every user experiences ..."
Abstract
-
Cited by 117 (8 self)
- Add to MetaCart
We present an “opportunistic ” transmission scheduling policy that exploits time-varying channel conditions and maxi-mizes the system performance stochastically under a certain resource allocation constraint. We establish the optimality of the scheduling scheme, and also that every user experiences a performance improvement over any non-opportunistic scheduling policy when users have independent performance values. We demonstrate via simulation results that the scheme is robust to es-timation errors, and also works well for nonstationary scenarios, resulting in performance improvements of 20–150 % compared with a scheduling scheme that does not take into account channel conditions. Last, we discuss an extension of our opportunistic scheduling scheme to improve “short-term ” performance.
A framework for opportunistic scheduling in wireless networks
- COMPUTER NETWORKS
, 2003
"... We present a method, called opportunistic scheduling, for exploiting the time-varying nature of the radio environment to increase the overall performance of the system under certain quality of service/fairness requirements of users. We first introduce a general framework for opportunistic scheduling ..."
Abstract
-
Cited by 100 (5 self)
- Add to MetaCart
We present a method, called opportunistic scheduling, for exploiting the time-varying nature of the radio environment to increase the overall performance of the system under certain quality of service/fairness requirements of users. We first introduce a general framework for opportunistic scheduling, and then identify three general categories of scheduling problems under this framework. We provide optimal solutions for each of these scheduling problems. All the proposed scheduling policies are implementable online; we provide parameter estimation algorithms and implementation procedures for them. We also show how previous work by us and others directly fits into or is related to this framework. We demonstrate via simulation that opportunistic scheduling schemes result in significant performance improvement compared with non-opportunistic alternatives.
Markov Chain Monte Carlo Estimation of Exponential Random Graph Models
- Journal of Social Structure
, 2002
"... This paper is about estimating the parameters of the exponential random graph model, also known as the p # model, using frequentist Markov chain Monte Carlo (MCMC) methods. The exponential random graph model is simulated using Gibbs or Metropolis-Hastings sampling. The estimation procedures consider ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
This paper is about estimating the parameters of the exponential random graph model, also known as the p # model, using frequentist Markov chain Monte Carlo (MCMC) methods. The exponential random graph model is simulated using Gibbs or Metropolis-Hastings sampling. The estimation procedures considered are based on the Robbins-Monro algorithm for approximating a solution to the likelihood equation.
Adaptive Stochastic Approximation by the Simultaneous Perturbation Method
, 2000
"... Stochastic approximation (SA) has long been applied for problems of minimizing loss functions or root finding with noisy input information. As with all stochastic search algorithms, there are adjustable algorithm coefficients that must be specified, and that can have a profound effect on algorithm p ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
Stochastic approximation (SA) has long been applied for problems of minimizing loss functions or root finding with noisy input information. As with all stochastic search algorithms, there are adjustable algorithm coefficients that must be specified, and that can have a profound effect on algorithm performance. It is known that choosing these coefficients according to an SA analog of the deterministic Newton--Raphson algorithm provides an optimal or near-optimal form of the algorithm. However, directly determining the required Hessian matrix (or Jacobian matrix for root finding) to achieve this algorithm form has often been difficult or impossible in practice. This paper presents a general adaptive SA algorithm that is based on a simple method for estimating the Hessian matrix, while concurrently estimating the primary parameters of interest. The approach applies in both the gradient-free optimization (Kiefer--Wolfowitz) and root-finding/stochastic gradient-based (Robbins--Monro) settings, and is based on the "simultaneous perturbation (SP)" idea introduced previously. The algorithm requires only a small number of loss function or gradient measurements per iteration---independent of the problem dimension---to adaptively estimate the Hessian and parameters of primary interest. Aside from introducing the adaptive SP approach, this paper presents practical implementation guidance, asymptotic theory, and a nontrivial numerical evaluation. Also included is a discussion and numerical analysis comparing the adaptive SP approach with the iterate-averaging approach to accelerated SA.
Hidden conditional random fields for phone classification
- in Interspeech
, 2005
"... In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the non-stationarity of speech signals. We show that HCRFs can easily be trained u ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
In this paper, we show the novel application of hidden conditional random fields (HCRFs) – conditional random fields with hidden state sequences – for modeling speech. Hidden state sequences are critical for modeling the non-stationarity of speech signals. We show that HCRFs can easily be trained using the simple direct optimization technique of stochastic gradient descent. We present the results on the TIMIT phone classification task and show that HCRFs outperforms comparable ML and CML/MMI trained HMMs. In fact, HCRF results on this task are the best single classifier results known to us. We note that the HCRF framework is easily extensible to recognition since it is a state and label sequence modeling technique. We also note that HCRFs have the ability to handle complex features without any change in training procedure. 1.
An Overview of the Simultaneous Perturbation Method for Efficient Optimization
"... This article is an introduction to the simultaneous perturbation stochastic approximation (SPSA) algorithm for stochastic optimization of multivariate systems. Optimization algorithms play a critical role in the design, analysis, and control of most engineering systems and are in widespread use in t ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
This article is an introduction to the simultaneous perturbation stochastic approximation (SPSA) algorithm for stochastic optimization of multivariate systems. Optimization algorithms play a critical role in the design, analysis, and control of most engineering systems and are in widespread use in the work of APL and other organizations: The future, in fact, will be full of [optimization] algorithms. They are becoming part of almost everything. They are moving up the complexity chain to make entire companies more efficient. They also are moving down the chain as computers spread. (USA Today, 31 Dec 1997) Before presenting the SPSA algorithm, we provide some general background on the stochastic optimization context of interest here
Convergence of a stochastic approximation version of the EM algorithm
, 1997
"... The Expectation Maximization (EM) algorithm is a powerful computational technique for locating maxima of functions... ..."
Abstract
-
Cited by 47 (7 self)
- Add to MetaCart
The Expectation Maximization (EM) algorithm is a powerful computational technique for locating maxima of functions...
On-line EM Algorithm for the Normalized Gaussian Network
, 1999
"... A Normalized Gaussian Network (NGnet) (Moody and Darken 1989) is a network of local linear regression units. The model softly partitions the input space by normalized Gaussian functions and each local unit linearly approximates the output within the partition. In this article, we propose a new on ..."
Abstract
-
Cited by 45 (6 self)
- Add to MetaCart
A Normalized Gaussian Network (NGnet) (Moody and Darken 1989) is a network of local linear regression units. The model softly partitions the input space by normalized Gaussian functions and each local unit linearly approximates the output within the partition. In this article, we propose a new on-line EM algorithm for the NGnet, which is derived from the batch EM algorithm (Xu, Jordan and Hinton 1995) by introducing a discount factor. We show that the on-line EM algorithm is equivalent to the batch EM algorithm if a specific scheduling of the discount factor is employed. In addition, we show that the on-line EM algorithm can be considered as a stochastic approximation method to find the maximum likelihood estimator. A new regularization method is proposed in order to deal with a singular input distribution. In order to manage dynamic environments, where the input-output distribution of data changes over time, unit manipulation mechanisms such as unit production, unit deletion...

