Results 1 
9 of
9
Convergence results for the EM Approach to Mixtures of Experts Architectures
 NEURAL NETWORKS
, 1995
"... The ExpectationMaximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architectur ..."
Abstract

Cited by 104 (6 self)
 Add to MetaCart
The ExpectationMaximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Update rules for parameter estimation in Bayesian networks
, 1997
"... This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is co ..."
Abstract

Cited by 59 (2 self)
 Add to MetaCart
(Show Context)
This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a preaccumulated set of samples is used in a onetime model selection process. In the batch case, our framework encompassesboth the gradient projection algorithm [2, 3] and the EM algorithm [14] for Bayesian networks. The framework also leads to new online and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM. 1 Introduction Over the past few years, there has been a growing interest in the problem of le...
A Comparison of New and Old Algorithms for A Mixture Estimation Problem
 Machine Learning
, 1995
"... . We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projectio ..."
Abstract

Cited by 36 (14 self)
 Add to MetaCart
(Show Context)
. We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EGj ). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMj which, for j = 1, gives the usual EM update. Experimentally, both the EMjupdate and the EGjupdate for j ? 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EGj algorithm. 1. Introduction The problem of maximumlikelihood (ML) estimation of a mixture of de...
An iterative procedure for obtaining maximumlikelihood estimates of the parameters for a mixture of normal distributions
 SIAM J. Appl. Math
, 1978
"... Abstract. This paper addresses the problem of obtaining numerically maximumlikelihood estimates of the parameters for a mixture of normal distributions. In recent literature, a certain successiveapproximations procedure, based on the likelihood equations, was shown empirically to be effective in ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
Abstract. This paper addresses the problem of obtaining numerically maximumlikelihood estimates of the parameters for a mixture of normal distributions. In recent literature, a certain successiveapproximations procedure, based on the likelihood equations, was shown empirically to be effective in numerically approximating such maximumlikelihood estimates; however, the reliability of this procedure was not established theoretically. Here, we introduce a general iterative procedure, of the generalized steepestascent (deflectedgradient) type, which is just the procedure known in the literature when the stepsize is taken to be 1. We show that, with probability as the sample size grows large, this procedure converges locally to the strongly consistent maximumlikelihood estimate whenever the stepsize lies between 0 and 2. We also show that the stepsize which yields optimal local convergence rates for large samples is determined in a sense by the "separation " of the component normal densities and is bounded below by a number between and 2. 1. Introduction. Let
Constrained Nonparametric Estimation via Mixtures
, 2001
"... We present a general approach to estimating probability measures constrained to lie in a convex set. We represent constrained measures as mixtures of simple, known extreme measures, and so the problem of estimating a constrained measure becomes one of estimating an unconstrained mixing measure. Conv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a general approach to estimating probability measures constrained to lie in a convex set. We represent constrained measures as mixtures of simple, known extreme measures, and so the problem of estimating a constrained measure becomes one of estimating an unconstrained mixing measure. Convex constraints arise in many modeling situations, such as estimation of the mean and estimation under stochastic ordering constraints. We describe mixture representation techniques for these and other situations, and discuss applications to maximum likelihood and Bayesian estimation.
Fitting Finite Mixture Model to Exchange Rate Using Maximum Likelihood Estimation 1
"... Abstract Exchange rate has great influence to the inflation and economic growth for a country. The importance of currency is that the great influence on import and export prices with the changes of exchange rate. Thus, maximum likelihood estimation (MLE) is used to fit finite mixture model. In this ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Exchange rate has great influence to the inflation and economic growth for a country. The importance of currency is that the great influence on import and export prices with the changes of exchange rate. Thus, maximum likelihood estimation (MLE) is used to fit finite mixture model. In this paper, a twocomponent mixture of normal distribution is used to analysis the return value of nominal monthly exchange rate for Malaysia, Thailand and Philippines by using maximum likelihood estimation. The data collected for this paper is taken from July 2005 until September 2012. IJSER Key words: Exchange rate, maximum likelihood estimation, finite
Article Exact Fit of Simple Finite Mixture Models †
"... Abstract: How to forecast next year’s portfoliowide credit default rate based on last year’s default observations and the current score distribution? A classical approach to this problem consists of fitting a mixture of the conditional score distributions observed last year to the current score dis ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: How to forecast next year’s portfoliowide credit default rate based on last year’s default observations and the current score distribution? A classical approach to this problem consists of fitting a mixture of the conditional score distributions observed last year to the current score distribution. This is a special (simple) case of a finite mixture model where the mixture components are fixed and only the weights of the components are estimated. The optimum weights provide a forecast of next year’s portfoliowide default rate. We point out that the maximumlikelihood (ML) approach to fitting the mixture distribution not only gives an optimum but even an exact fit if we allow the mixture components to vary but keep their density ratio fixed. From this observation we can conclude that the standard default rate forecast based on last year’s conditional default rates will always be located between last year’s portfoliowide default rate and the ML forecast for next year. As an application example, cost quantification is then discussed. We also discuss how the mixture model based estimation methods can be used to forecast total loss. This involves the reinterpretation of an