Results 1  10
of
12
Adaptive Overrelaxed Bound Optimization Methods
 In Proceedings of International Conference on Machine Learning, ICML. International Conference on Machine Learning, ICML
, 2003
"... We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and NonNegative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizer ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
We study a class of overrelaxed bound optimization algorithms, and their relationship to standard bound optimizers, such as ExpectationMaximization, Iterative Scaling, CCCP and NonNegative Matrix Factorization. We provide a theoretical analysis of the convergence properties of these optimizers and identify analytic conditions under which they are expected to outperform the standard versions. Based on this analysis, we propose a novel, simple adaptive overrelaxed scheme for practical optimization and report empirical results on several synthetic and realworld data sets showing that these new adaptive methods exhibit superior performance (in certain cases by several orders of magnitude) compared to their traditional counterparts. Our "dropin" extensions are simple to implement, apply to a wide variety of algorithms, almost always give a substantial speedup, and do not require any theoretical analysis of the underlying algorithm.
Robust maximumlikelihood estimation of multivariable dynamic systems
 Automatica
, 2005
"... This paper examines the problem of estimating linear timeinvariant statespace system models. In particular it addresses the parametrization and numerical robustness concerns that arise in the multivariable case. These difficulties are well recognised in the literature, resulting (for example) in e ..."
Abstract

Cited by 27 (12 self)
 Add to MetaCart
This paper examines the problem of estimating linear timeinvariant statespace system models. In particular it addresses the parametrization and numerical robustness concerns that arise in the multivariable case. These difficulties are well recognised in the literature, resulting (for example) in extensive study of subspace based techniques, as well as recent interest in “data driven” local coordinate approaches to gradient search solutions. The paper here proposes a different strategy that employs the Expectation Maximisation (EM) technique. The consequence is an algorithm that is iterative, and locally convergent to stationary points of the (Gaussian) Likelihood function. Furthermore, theoretical and empirical evidence presented here establishes additional attractive properties such as numerical robustness, avoidance of difficult parametrization choices, the ability to estimate unstable systems, the ability to naturally and easily estimate nonzero initial conditions, and moderate computational cost. Moreover, since the methods here are MaximumLikelihood based, they have associated known and asymptotically optimal statistical properties. 1
Gaussian mean shift is an EM algorithm
 IEEE Trans. on Pattern Analysis and Machine Intelligence
, 2005
"... The meanshift algorithm, based on ideas proposed by Fukunaga and Hostetler (1975), is a hillclimbing algorithm on the density defined by a finite mixture or a kernel density estimate. Meanshift can be used as a nonparametric clustering method and has attracted recent attention in computer vision ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
The meanshift algorithm, based on ideas proposed by Fukunaga and Hostetler (1975), is a hillclimbing algorithm on the density defined by a finite mixture or a kernel density estimate. Meanshift can be used as a nonparametric clustering method and has attracted recent attention in computer vision applications such as image segmentation or tracking. We show that, when the kernel is Gaussian, meanshift is an expectationmaximisation (EM) algorithm, and when the kernel is nongaussian, meanshift is a generalised EM algorithm. This implies that meanshift converges from almost any starting point and that, in general, its convergence is of linear order. For Gaussian meanshift we show: (1) the rate of linear convergence approaches 0 (superlinear convergence) for very narrow or very wide kernels, but is often close to 1 (thus extremely slow) for intermediate widths, and exactly 1 (sublinear convergence) for widths at which modes merge; (2) the iterates approach the mode along the local principal component of the data points from the inside of the convex hull of the data points; (3) the convergence domains are nonconvex and can be disconnected and show fractal behaviour. We suggest ways of accelerating meanshift based on the EM interpretation.
The Art of Data Augmentation
, 2001
"... The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Lai ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The term data augmentation refers to methods for constructing iterative optimization or sampling algorithms via the introduction of unobserved data or latent variables. For deterministic algorithms,the method was popularizedin the general statistical community by the seminal article by Dempster, Laird, and Rubin on the EM algorithm for maximizing a likelihood function or, more generally, a posterior density. For stochastic algorithms, the method was popularized in the statistical literature by Tanner and Wong’s Data Augmentation algorithm for posteriorsampling and in the physics literatureby Swendsen and Wang’s algorithm for sampling from the Ising and Potts models and their generalizations; in the physics literature,the method of data augmentationis referred to as the method of auxiliary variables. Data augmentationschemes were used by Tanner and Wong to make simulation feasible and simple, while auxiliary variables were adopted by Swendsen and Wang to improve the speed of iterative simulation. In general,however, constructing data augmentation schemes that result in both simple and fast algorithms is a matter of art in that successful strategiesvary greatlywith the (observeddata) models being considered.After an overview of data augmentation/auxiliary variables and some recent developments in methods for constructing such
Crossfertilizing strategies for better EM mountain climbing and DA field exploration: A graphical guide book
, 2009
"... In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorith ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
In recent years, a variety of extensions and refinements have been developed for data augmentation based model fitting routines. These developments aim to extend the application, improve the speed, and/or simplify the implementation of data augmentation methods, such as the deterministic EM algorithm for mode finding and stochastic Gibbs sampler and other auxiliaryvariable based methods for posterior sampling. In this overview article we graphically illustrate and compare a number of these extensions all of which aim to maintain the simplicity and computation stability of their predecessors. We particularly emphasize the usefulness of identifying similarities between the deterministic and stochastic counterparts as we seek more efficient computational strategies. We also demonstrate the applicability of data augmentation methods for handling complex models
Nesting EM algorithms for computational efficiency
 Statistical Sinica
, 2000
"... Abstract: Computing posterior modes (e.g., maximum likelihood estimates) for models involving latent variables or missing data often involves complicated optimization procedures. By splitting this task into two simpler parts, however, EMtype algorithms often offer a simple solution. Although this ap ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Abstract: Computing posterior modes (e.g., maximum likelihood estimates) for models involving latent variables or missing data often involves complicated optimization procedures. By splitting this task into two simpler parts, however, EMtype algorithms often offer a simple solution. Although this approach has proven useful, in some settings even these simpler tasks are challenging. In particular, computations involving latent variables are typically difficult to simplify. Thus, in models such as hierarchical models with complicated latent variable structures, computationally intensive methods may be required for the expectation step of EM. This paper describes how nesting two or more EM algorithms can take advantage of closed form conditional expectations and lead to algorithms which converge faster, are straightforward to implement, and enjoy stable convergence properties. Methodology to monitor convergence of nested EM algorithms is developed using importance and bridge sampling. The strategy is applied to hierarchical probit and t regression models to derive algorithms which incorporate aspects of MonteCarlo EM, PXEM, and nesting in order to combine computational efficiency with easy implementation. Key words and phrases: Bridge sampling, efficient data augmentation, Gibbs sampler, GLMM, hierarchical models, importance sampling, MCEM algorithm, MCMC, probit models, tmodels, working parameters. 1.
On computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation
"... We address the problem of computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation. These are the largest eigenvalue and its associated eigenvector for the Jacobian of the EM operator at a maximum likelihood estimate, which are im ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We address the problem of computing the largest fraction of missing information for the EM algorithm and the worst linear function for data augmentation. These are the largest eigenvalue and its associated eigenvector for the Jacobian of the EM operator at a maximum likelihood estimate, which are important for assessing convergence in iterative simulation. An estimate of the largest fraction of missing information is available from the EM iterates � this is often adequate since only a few gures of accuracy are needed. In some instances the EM iteration also gives an estimate of the worst linear function. We showthat improved estimates can be essential for proper inference. In order to obtain improved estimates e ciently, weuse the power method for eigencomputation. Unlike eigenvalue decomposition, the power method computes only the largest eigenvalue and eigenvector of a matrix, it can take advantage of a good eigenvector estimate as an initial value and it can be terminated after only a few gures of accuracy are achieved. Moreover, the matrix products needed in the power method can be computed by extrapolation, obviating the need to form the Jacobian of the EM operator. We give results of simulation studies on multivariate normal data showing that this approach becomes more e cient as the data dimension increases than methods that use a nitedi erence approximation to the Jacobian, which is the only generalpurpose alternative available.
SPARLS: The Sparse RLS Algorithm
, 2010
"... We develop a Recursive L1Regularized Least Squares (SPARLS) algorithm for the estimation of a sparse tapweight vector in the adaptive filtering setting. The SPARLS algorithm exploits noisy observations of the tapweight vector output stream and produces its estimate using an ExpectationMaximizati ..."
Abstract
 Add to MetaCart
We develop a Recursive L1Regularized Least Squares (SPARLS) algorithm for the estimation of a sparse tapweight vector in the adaptive filtering setting. The SPARLS algorithm exploits noisy observations of the tapweight vector output stream and produces its estimate using an ExpectationMaximization type algorithm. We prove the convergence of the SPARLS algorithm to a nearoptimal estimate in a stationary environment and present analytical results for the steady state error. Simulation studies in the context of channel estimation, employing multipath wireless channels, show that the SPARLS algorithm has significant improvement over the conventional widelyused Recursive Least Squares (RLS) algorithm in terms of mean squared error (MSE). Moreover, these simulation studies suggest that the SPARLS algorithm (with slight modifications) can operate with lower computational requirements than the RLS algorithm, when applied to tapweight vectors with fixed support.
DISCUSSION: ONESTEP SPARSE ESTIMATES IN NONCONCAVE PENALIZED LIKELIHOOD MODELS:
, 2007
"... 1. An insider’s minor comments. Section 2.3 seems to be the reason that I am a discussant. There, it was first stated that the proposed LLA algorithm is an instance of the MM algorithm, as termed by Lange, Hunter and Yang [8]. Then it was shown, under certain conditions, that it is also an EM algori ..."
Abstract
 Add to MetaCart
1. An insider’s minor comments. Section 2.3 seems to be the reason that I am a discussant. There, it was first stated that the proposed LLA algorithm is an instance of the MM algorithm, as termed by Lange, Hunter and Yang [8]. Then it was shown, under certain conditions, that it is also an EM algorithm. My initial reaction was “hmmm, the authors ’ reading of Lange, Hunter and Yang [8] must have ceased before reaching its discussions,” because a more general “MM = EM ” result using the same Laplace transform technique was the base for a key inquiry of Meng [10], a discussion of Lange, Hunter and Yang [8]. Upon a more careful reading, I realized that the authors ’ construction, though mathematically equivalent to mine, gives a different interpretation to the constructed missing data/latent variable. This is rather interesting, especially if my initial reaction was correct. For the current paper, this “MM = EM ” result appears to be of minor interest, especially because its potential benefit is not explored in the paper. Instead, the only punch line seems to be a logically unsubstantiated one: