Results 1 -
9 of
9
Adaptive Overrelaxed Bound Optimization Methods
- In Proceedings of International Conference on Machine Learning, ICML. International Conference on Machine Learning, ICML
, 2003
"... We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and Non-Negative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizer ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and Non-Negative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizers and identify analytic conditions under which they are expected to outperform the standard versions. Based on this analysis, we propose a novel, simple adaptive overrelaxed scheme for practical optimization and report empirical results on several synthetic and real-world data sets showing that these new adaptive methods exhibit superior performance (in certain cases by several orders of magnitude) compared to their traditional counterparts. Our "drop-in" extensions are simple to implement, apply to a wide variety of algorithms, almost always give a substantial speedup, and do not require any theoretical analysis of the underlying algorithm.
Robust maximum-likelihood estimation of multivariable dynamic systems
- Automatica
, 2005
"... This paper examines the problem of estimating linear time-invariant state-space system models. In particular it addresses the parametrization and numerical robustness concerns that arise in the multivariable case. These difficulties are well recognised in the literature, resulting (for example) in e ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
This paper examines the problem of estimating linear time-invariant state-space system models. In particular it addresses the parametrization and numerical robustness concerns that arise in the multivariable case. These difficulties are well recognised in the literature, resulting (for example) in extensive study of subspace based techniques, as well as recent interest in “data driven” local co-ordinate approaches to gradient search solutions. The paper here proposes a different strategy that employs the Expectation Maximisation (EM) technique. The consequence is an algorithm that is iterative, and locally convergent to stationary points of the (Gaussian) Likelihood function. Furthermore, theoretical and empirical evidence presented here establishes additional attractive properties such as numerical robustness, avoidance of difficult parametrization choices, the ability to estimate unstable systems, the ability to naturally and easily estimate non-zero initial conditions, and moderate computational cost. Moreover, since the methods here are Maximum-Likelihood based, they have associated known and asymptotically optimal statistical properties. 1
Gaussian mean shift is an EM algorithm
- IEEE Trans. on Pattern Analysis and Machine Intelligence
, 2005
"... The mean-shift algorithm, based on ideas proposed by Fukunaga and Hostetler (1975), is a hill-climbing algorithm on the density defined by a finite mixture or a kernel density estimate. Mean-shift can be used as a nonparametric clustering method and has attracted recent attention in computer vision ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
The mean-shift algorithm, based on ideas proposed by Fukunaga and Hostetler (1975), is a hill-climbing algorithm on the density defined by a finite mixture or a kernel density estimate. Mean-shift can be used as a nonparametric clustering method and has attracted recent attention in computer vision applications such as image segmentation or tracking. We show that, when the kernel is Gaussian, mean-shift is an expectationmaximisation (EM) algorithm, and when the kernel is non-gaussian, mean-shift is a generalised EM algorithm. This implies that mean-shift converges from almost any starting point and that, in general, its convergence is of linear order. For Gaussian mean-shift we show: (1) the rate of linear convergence approaches 0 (superlinear convergence) for very narrow or very wide kernels, but is often close to 1 (thus extremely slow) for intermediate widths, and exactly 1 (sublinear convergence) for widths at which modes merge; (2) the iterates approach the mode along the local principal component of the data points from the inside of the convex hull of the data points; (3) the convergence domains are nonconvex and can be disconnected and show fractal behaviour. We suggest ways of accelerating mean-shift based on the EM interpretation.
The Art of Data Augmentation
, 2001
"... The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Lai ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong’s Data Augmentation algorithm for posteriorsampling and in the physics literatureby Swendsen and Wang’s algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature,the method of data augmentationis referred to as the method of auxiliary variables. Data augmentationschemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general,however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategiesvary greatlywith the (observed-data) models being considered.After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such
Cross-fertilizing strategies for better EM mountain climbing and DA field exploration: A graphical guide book
, 2009
"... In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorith ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliary-variable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models
Nesting EM algorithms for computational efficiency
- Statistical Sinica
, 2000
"... Abstract: Computing posterior modes (e.g., maximum likelihood estimates) for models involving latent variables or missing data often involves complicated optimization procedures. By splitting this task into two simpler parts, however, EMtype algorithms often offer a simple solution. Although this ap ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract: Computing posterior modes (e.g., maximum likelihood estimates) for models involving latent variables or missing data often involves complicated optimization procedures. By splitting this task into two simpler parts, however, EMtype algorithms often offer a simple solution. Although this approach has proven useful, in some settings even these simpler tasks are challenging. In particular, computations involving latent variables are typically difficult to simplify. Thus, in models such as hierarchical models with complicated latent variable structures, computationally intensive methods may be required for the expectation step of EM. This paper describes how nesting two or more EM algorithms can take advantage of closed form conditional expectations and lead to algorithms which converge faster, are straightforward to implement, and enjoy stable convergence properties. Methodology to monitor convergence of nested EM algorithms is developed using importance and bridge sampling. The strategy is applied to hierarchical probit and t regression models to derive algorithms which incorporate aspects of Monte-Carlo EM, PX-EM, and nesting in order to combine computational efficiency with easy implementation. Key words and phrases: Bridge sampling, efficient data augmentation, Gibbs sampler, GLMM, hierarchical models, importance sampling, MCEM algorithm, MCMC, probit models, t-models, working parameters. 1.
On computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation
"... We address the problem of computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation. These are the largest eigenvalue and its associated eigenvector for the Jacobian of the EM operator at a maximum likelihood estimate, which are im ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We address the problem of computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation. These are the largest eigenvalue and its associated eigenvector for the Jacobian of the EM operator at a maximum likelihood estimate, which are important for assessing convergence in iterative simulation. An estimate of the largest fraction of missing information is available from the EM iterates � this is often adequate since only a few gures of accuracy are needed. In some instances the EM iteration also gives an estimate of the worst linear function. We showthat improved estimates can be essential for proper inference. In order to obtain improved estimates e ciently, weuse the power method for eigencomputation. Unlike eigenvalue decomposition, the power method computes only the largest eigenvalue and eigenvector of a matrix, it can take advantage of a good eigenvector estimate as an initial value and it can be terminated after only a few gures of accuracy are achieved. Moreover, the matrix products needed in the power method can be computed by extrapolation, obviating the need to form the Jacobian of the EM operator. We give results of simulation studies on multivariate normal data showing that this approach becomes more e cient as the data dimension increases than methods that use a nite-di erence approximation to the Jacobian, which is the only general-purpose alternative available.
SPARLS: The Sparse RLS Algorithm
, 2010
"... We develop a Recursive L1-Regularized Least Squares (SPARLS) algorithm for the estimation of a sparse tap-weight vector in the adaptive filtering setting. The SPARLS algorithm exploits noisy observations of the tap-weight vector output stream and produces its estimate using an Expectation-Maximizati ..."
Abstract
- Add to MetaCart
We develop a Recursive L1-Regularized Least Squares (SPARLS) algorithm for the estimation of a sparse tap-weight vector in the adaptive filtering setting. The SPARLS algorithm exploits noisy observations of the tap-weight vector output stream and produces its estimate using an Expectation-Maximization type algorithm. We prove the convergence of the SPARLS algorithm to a near-optimal estimate in a stationary environment and present analytical results for the steady state error. Simulation studies in the context of channel estimation, employing multi-path wireless channels, show that the SPARLS algorithm has significant improvement over the conventional widely-used Recursive Least Squares (RLS) algorithm in terms of mean squared error (MSE). Moreover, these simulation studies suggest that the SPARLS algorithm (with slight modifications) can operate with lower computational requirements than the RLS algorithm, when applied to tap-weight vectors with fixed support.

