## Bayesian Neural Networks with Correlating Residuals (1999)

Venue: | In IJCNN'99: Proceedings of the 1999 International Joint Conference on Neural Networks. IEEE |

Citations: | 6 - 4 self |

### BibTeX

@INPROCEEDINGS{Vehtari99bayesianneural,

author = {Aki Vehtari and Jouko Lampinen},

title = {Bayesian Neural Networks with Correlating Residuals},

booktitle = {In IJCNN'99: Proceedings of the 1999 International Joint Conference on Neural Networks. IEEE},

year = {1999},

pages = {no.},

publisher = {IEEE}

}

### OpenURL

### Abstract

Usually in multivariate regression problem it is' assumed that residuals' of outputs' are independent of each other. In many applications a more realistic model would allow dependencies between the outputs'. In this paper we show how a Bayesian treatment using Markov Chain Monte Carlo (MCMC) method can allow for a full covariance matrix with Multi Layer Perceptton (MLP) neural networks'.

### Citations

4027 |
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ... generated using a Markov chain that has the desired posterior distribution as its equilibrium distribution. Neal has used the hybrid Monte Carlo (HMC) algorithm [2] for parameters and Gibbs sampling =-=[5, 3, 7]-=- for hyperparameters. HMC is an elaborate Metropolis-Hastings Monte Carlo method, which makes efficient use of gradient information to reduce random walk behavior. The gradient indicates in which dire... |

1464 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ... specified in terms of corresponding precisions - o- 2, with given Gamma distributions. (Inverse) Gamma distribution in Bayesian analysis has been discussed, e.g., by Box & Tiao [1] and Gelman et al. =-=[4]-=-. Gamma distribution in connection with MLPs has been discussed by Neal [10]. Following [10] we use four independent Gaussian prior distributions for different weight groups in MLP. Gaussians have fix... |

643 | Bayesian Learning in Neural Networks
- Neal
- 1996
(Show Context)
Citation Context ... of samples in order to get good results. These limitations can be overcome in a Bayesian treatment. Bayesian neural networks with independent output noises have been discussed by MacKay [8] and Neal =-=[9, 10]-=-. Purpose of this paper is to show how this problem can be solved using full covariance matrix with Bayesian treatment and Markov Chain Monte Carlo (MCMC) methods. We begin by briefly reviewing the Ba... |

531 |
Bayesian Inference in Statistical Analysis
- Box, Tiao
- 1973
(Show Context)
Citation Context ...rior distributions are specified in terms of corresponding precisions - o- 2, with given Gamma distributions. (Inverse) Gamma distribution in Bayesian analysis has been discussed, e.g., by Box & Tiao =-=[1]-=- and Gelman et al. [4]. Gamma distribution in connection with MLPs has been discussed by Neal [10]. Following [10] we use four independent Gaussian prior distributions for different weight groups in M... |

427 | A practical Bayesian framework for backpropagation networks
- MacKay
- 1992
(Show Context)
Citation Context ... large number of samples in order to get good results. These limitations can be overcome in a Bayesian treatment. Bayesian neural networks with independent output noises have been discussed by MacKay =-=[8]-=- and Neal [9, 10]. Purpose of this paper is to show how this problem can be solved using full covariance matrix with Bayesian treatment and Markov Chain Monte Carlo (MCMC) methods. We begin by briefly... |

184 |
Illustration of Bayesian inference in normal data models using Gibbs sampling
- Gelfand, Hills, et al.
- 1990
(Show Context)
Citation Context ... generated using a Markov chain that has the desired posterior distribution as its equilibrium distribution. Neal has used the hybrid Monte Carlo (HMC) algorithm [2] for parameters and Gibbs sampling =-=[5, 3, 7]-=- for hyperparameters. HMC is an elaborate Metropolis-Hastings Monte Carlo method, which makes efficient use of gradient information to reduce random walk behavior. The gradient indicates in which dire... |

136 |
Random number generation and Monte Carlo methods
- Gentle
- 2003
(Show Context)
Citation Context ... full conditional distribution p (o-2 I{ zi }) is inverse gamma distribution with parameters vn vo + rz (11) 2 (oo-o + + n) Many software packages generate gamma random variables directly. See, e.g., =-=[6]-=- for the algorithms. 4 Full covariance Neal discusses and demonstrates his methods using diagonal covariance matrix. Datasets used in [10, 11] do not have multivariate targets or they are artificial d... |

43 | Assessing relevance determination methods using DELVE - Neal - 1998 |

40 | Bayesian training of backpropagation networks by the Hybrid Monte Carlo method
- Neal
- 1992
(Show Context)
Citation Context ... of samples in order to get good results. These limitations can be overcome in a Bayesian treatment. Bayesian neural networks with independent output noises have been discussed by MacKay [8] and Neal =-=[9, 10]-=-. Purpose of this paper is to show how this problem can be solved using full covariance matrix with Bayesian treatment and Markov Chain Monte Carlo (MCMC) methods. We begin by briefly reviewing the Ba... |

33 | Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. Manuscript submitted for publication - Spiegelhalter, Best, et al. - 1998 |

2 |
Bayesian training of backpropagation networks by the hybrid Monte Carlo method
- Neat
- 1992
(Show Context)
Citation Context ... of samples in order to get good results. These limitations can be overcome in a Bayesian treatment. Bayesian neural networks with independent output noises have been discussed by MacKay [8] and Neal =-=[9, 10]-=-. Purpose of this paper is to show how this problem can be solved using full covariance matrix with Bayesian treatment and Markov Chain Monte Carlo (MCMC) methods. We begin by briefly reviewing the Ba... |

2 |
Bayesian Learning For Neural Networks
- Neat
- 1995
(Show Context)
Citation Context |

2 | Assessing relevance determination methods using DELVE - Neat - 1998 |

1 |
Using neural networks to model conditional variate densities
- Williams
- 1996
(Show Context)
Citation Context ...stic to allow dependencies between the outputs. This can be achieved with full covariance matrix. Use of full covariance matrix with maximum a posteftori (MAP) approach has been discussed by Williams =-=[13]-=-. However, MAP gives biased results (e.g., noise variance being systematically under-estimated) and requires large number of samples in order to get good results. These limitations can be overcome in ... |