## Accelerating cyclic update algorithms for parameter estimation by pattern searches

### Cached

### Download Links

- [www.cis.hut.fi]
- [www.cis.hut.fi]
- [www.hiit.fi]
- [users.ics.aalto.fi]
- [www.cis.hut.fi]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Processing Letters |

Citations: | 16 - 9 self |

### BibTeX

@ARTICLE{Honkela_acceleratingcyclic,

author = {Antti Honkela and Harri Valpola and Juha Karhunen},

title = {Accelerating cyclic update algorithms for parameter estimation by pattern searches},

journal = {Neural Processing Letters},

year = {},

pages = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A well-known drawback of this strategy is slow convergence in low noise conditions. We propose using so-called pattern searches which consist of an exploratory phase followed by a line search. During the exploratory phase, a search direction is determined by combining the individual updates of all subproblems. The approach can be used to speed up several well-known learning methods such as variational Bayesian learning (ensemble learning) and expectation-maximization algorithm with modest algorithmic modifications. Experimental results show that the proposed method is able to reduce the required convergence time by 60–85 % in realistic variational Bayesian learning problems.

### Citations

8090 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... pattern.tex; 4/03/2003; 10:40; p.2Accelerating Cyclic Update Algorithms ... 3 estimates such as maximum a posteriori (MAP) estimate or slightly more advanced EM (expectation maximization) algorithm =-=[5, 17]-=-. In EM algorithm, the parameters are divided into two sets which are then updated cyclically. The algorithm employs point estimates for some of the parameters, which can lead to problems with model o... |

4828 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...s are divided into two sets which are then updated cyclically. The algorithm employs point estimates for some of the parameters, which can lead to problems with model order estimation and overfitting =-=[4]-=-. Variational methods form one class of more advanced approximations. Their key idea is to approximate the exact posterior distribution p(θ|X) by another distribution q(θ) that is computationally easi... |

1081 |
Practical Methods of Optimization
- Fletcher
- 1981
(Show Context)
Citation Context ...ves of the cost function, it is reasonable to choose a derivative free line search algorithm. Good general candidates for such algorithms are the golden section method and the method of quadratic fit =-=[6, 2]-=-. The golden section method gives decent worst case performance, but does usually not perform as well as its competitors in more realistic situations. In our examples, the cost function appeared rough... |

831 | An introduction to variational methods for graphical models
- Jordan, Ghahramani, et al.
- 1999
(Show Context)
Citation Context ...nt values. Repeating this process for different sets of parameters, their values eventually converge to an optimum [3]. In particular, this strategy is utilized in the variational Bayesian techniques =-=[12]-=-, which decouple the difficult problem of describing the posterior probability density of the model parameters into many smaller tractable problems. It is well known that in the case of low noise, the... |

764 | A view of the EM algorithm that justifies incremental sparse and other variants
- Neal, Hinton
- 1998
(Show Context)
Citation Context ... pattern.tex; 4/03/2003; 10:40; p.2Accelerating Cyclic Update Algorithms ... 3 estimates such as maximum a posteriori (MAP) estimate or slightly more advanced EM (expectation maximization) algorithm =-=[5, 17]-=-. In EM algorithm, the parameters are divided into two sets which are then updated cyclically. The algorithm employs point estimates for some of the parameters, which can lead to problems with model o... |

716 |
Shetty: “Non-linear programming theory and algorithms
- Bazaraa, M
(Show Context)
Citation Context ...ves of the cost function, it is reasonable to choose a derivative free line search algorithm. Good general candidates for such algorithms are the golden section method and the method of quadratic fit =-=[6, 2]-=-. The golden section method gives decent worst case performance, but does usually not perform as well as its competitors in more realistic situations. In our examples, the cost function appeared rough... |

221 | Independent factor analysis
- Attias
- 1999
(Show Context)
Citation Context ... the maximum likelihood (ML) and maximum a posterior (MAP) estimation. The method has been successfully applied to various models such as hidden Markov models (HMMs) [16], independent factor analysis =-=[1]-=-, nonlinear independent component analysis [13], as well as switching and nonlinear state-space models [7, 18]. It has also been used as a foundation of a general framework of building blocks for late... |

143 | Variational learning for switching state-space models
- Ghahramani, Hinton
- 2000
(Show Context)
Citation Context ...plied to various models such as hidden Markov models (HMMs) [16], independent factor analysis [1], nonlinear independent component analysis [13], as well as switching and nonlinear state-space models =-=[7, 18]-=-. It has also been used as a foundation of a general framework of building blocks for latent variable models [20]. This building block framework is employed in the experimental part of this paper. 3. ... |

128 | Keeping the neural networks simple by minimizing the description length of the weights
- Hinton, Camp
- 1993
(Show Context)
Citation Context ...ct of several independent distributions, one for each parameter or a set of similar parameters. We use a particular variational method known as ensemble learning that has recently become very popular =-=[8, 15, 14]-=-. In ensemble learning the optimal approximating distribution is found by minimizing the Kullback-Leibler divergence between the approximate and true posterior. After some considerations [14, 15], thi... |

87 | An unsupervised ensemble learning method for nonlinear dynamic state-space models
- Valpola, Karhunen
(Show Context)
Citation Context ...plied to various models such as hidden Markov models (HMMs) [16], independent factor analysis [1], nonlinear independent component analysis [13], as well as switching and nonlinear state-space models =-=[7, 18]-=-. It has also been used as a foundation of a general framework of building blocks for latent variable models [20]. This building block framework is employed in the experimental part of this paper. 3. ... |

62 | Ensemble learning - Lappalainen, Miskin - 2000 |

49 | Developments in probabilistic modelling with neural networks– ensemble Learning
- Mackay
- 1995
(Show Context)
Citation Context ...ct of several independent distributions, one for each parameter or a set of similar parameters. We use a particular variational method known as ensemble learning that has recently become very popular =-=[8, 15, 14]-=-. In ensemble learning the optimal approximating distribution is found by minimizing the Kullback-Leibler divergence between the approximate and true posterior. After some considerations [14, 15], thi... |

48 |
R.I.(1993) “Conjugate Gradient Acceleration of the EM Algorithm
- Jamshidian, Jennrich
(Show Context)
Citation Context ... requires derivatives of the cost function and does not take advantage of the ability to solve independent subproblems easily. It has anyway been used successfully for speeding up the EM algorithm in =-=[11]-=-. It would be interesting to study whether a similar method will work for other algorithms such as variational Bayesian learning as well, and how it would compare with the pattern search approach prop... |

29 | Building blocks for hierarchical latent variable models
- Valpola, Raiko, et al.
- 2001
(Show Context)
Citation Context ...dent component analysis [13], as well as switching and nonlinear state-space models [7, 18]. It has also been used as a foundation of a general framework of building blocks for latent variable models =-=[20]-=-. This building block framework is employed in the experimental part of this paper. 3. A simple illustrative example As an example, consider the simple linear generative model x = As+n used in noisy i... |

25 |
Karhunen J and Oja E 2001 Independent Component Analysis
- Hyvärinen
(Show Context)
Citation Context ...mployed in the experimental part of this paper. 3. A simple illustrative example As an example, consider the simple linear generative model x = As+n used in noisy independent component analysis (ICA) =-=[10]-=-. There x denotes the known observation vector, s the unknown source vector, A the unknown mixing matrix, and n a vector of additive noise. In ICA, the mixing matrix A is chosen so that the components... |

25 | Nonlinear independent factor analysis by hierarchical models
- Valpola, Östman, et al.
- 2003
(Show Context)
Citation Context ...one eighth of the time required by the cyclic update method. 5.3. Hierarchical nonlinear factor analysis example The third experiment was conducted with a hierarchical nonlinear factor analysis model =-=[19]-=-. The data set used in the experiment was the same 20 dimensional artificial data generated by a random multi-layer perceptron (MLP) network from 8 independent random sources that was also used in [13... |

6 |
Ensemble learning for hidden Markov models.” Available from http://wol.ra.phy.cam.ac.uk/mackay
- MacKay
- 1997
(Show Context)
Citation Context ...nd overfitting problems related to the maximum likelihood (ML) and maximum a posterior (MAP) estimation. The method has been successfully applied to various models such as hidden Markov models (HMMs) =-=[16]-=-, independent factor analysis [1], nonlinear independent component analysis [13], as well as switching and nonlinear state-space models [7, 18]. It has also been used as a foundation of a general fram... |

6 |
Jeeves 1961, Direct search - solution of numerical and statistical problems
- Hooke, A
(Show Context)
Citation Context ...nced in higher-dimensional real-world problems. 4. The algorithm We propose speeding up convergence of cyclic update schemes by applying the idea of pattern searches introduced by Hooke and Jeeves in =-=[9]-=-. The pattern search consists of two phases. In the first exploratory phase, the objective function is optimized in each coordinate direction separately as usual. This phase is called parameter-wise u... |

2 | Jeeves: 1961, "Direct search' solution of numerical and statistical problems - Hooke, A |

2 |
Honkela A (2000) Bayesian nonlinear independent component analysis by multi-layer perceptrons
- Lappalainen
(Show Context)
Citation Context ...terior (MAP) estimation. The method has been successfully applied to various models such as hidden Markov models (HMMs) [16], independent factor analysis [1], nonlinear independent component analysis =-=[13]-=-, as well as switching and nonlinear state-space models [7, 18]. It has also been used as a foundation of a general framework of building blocks for latent variable models [20]. This building block fr... |

1 |
Hathaway: 2002, ‘Some Notes on Alternating Optimization
- Bezdek, J
(Show Context)
Citation Context ...d up at a time and optimized while keeping the other parameters frozen to their current values. Repeating this process for different sets of parameters, their values eventually converge to an optimum =-=[3]-=-. In particular, this strategy is utilized in the variational Bayesian techniques [12], which decouple the difficult problem of describing the posterior probability density of the model parameters int... |