### Citations

3924 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ... for inverting HBMs commonly used in neuroimaging, including DCM for fMRI and a standard three-level HGF for learning and decision-making tasks. Critically, to dealwith the “curse of dimensionality” (=-=Bellman, 1957-=-) in applicationswhere the number of parameters to optimise over is fairly high (e.g., N30 parameters as often encountered in DCMs), we introduce a novel combination of GPO and local gradient methods ... |

1930 | 2003 Information Theory, Inference and Learning Algorithms
- MacKay
(Show Context)
Citation Context ... like to obtain an approximation to the log evidence (or logmarginal likelihood) as ameasure for the balance between the fit (accuracy) of the model and its complexity (Bayesian model selection, BMS; =-=MacKay, 2003-=-). However, as the model evidence is an integral over the joint probability, it is usually prohibitively expensive to compute. One solution to this problem is variational model inversion: by optimisin... |

1817 | Bayes factors
- Kass, Raftery
- 1995
(Show Context)
Citation Context ...conventional QN method was larger than 3 in both cases; this value corresponds to an approximate Bayes factor of 20 and is usually considered as a threshold for decision in Bayesian model comparison (=-=Kass & Raftery, 1995-=-). Notably, these analyses only test for linear relations between model parameter estimates and behavioural/questionnaire data, andwe cannot exclude that the threemethodsmight perform differently if n... |

1176 |
A neural substrate of prediction and reward
- Schultz, Dayan, et al.
- 1997
(Show Context)
Citation Context ...ic parameters are inferred through variationaling (for review, see Friston and Dolan, 2010). These models can be categorised into two groups. First, models of information processing ference learning (=-=Schultz et al., 1997-=-) to more co models (e.g., Behrens et al., 2007). One particulathe advent of fMRI in the early 1990s, a first decade of neuroimaging research focused on the problem of mapping, i.e., where particular ... |

1017 |
A theory of Pavlovian conditioning: Variations on the effectiveness of reinforcement and non-reinforcement
- Rescorla, Wagner
- 1972
(Show Context)
Citation Context ...ructures like the insula (Preuschoff et al., 2008). Over the years, themodels employed have become increasingly complex, from classical reinforcement learningmodels such as the Rescorla-Wagner model (=-=Rescorla and Wagner, 1972-=-) or temporal dif-prediction errors. These computational quant ⁎ Corresponding author at: Wilfriedstrasse 6, 8032 Zu 634 91 31. E-mail address: ekaterina.lomakina@inf.ethz.ch (E.I. Lo http://dx.doi.or... |

624 |
A view of cloud computing
- Armbrust, Fox, et al.
- 2009
(Show Context)
Citation Context ...e energy GP 0.61 (p = 0.04) −134.38 QN 0.50 (p = 0.11) −137.39 MH 0.62 (p = 0.03) −134.54 144 E.I. Lomakina et al. / NeuroImage 118 (2015) 133–145the HGF). Generally, improvements in cloud computing (=-=Armbrust et al., 2010-=-) and the use of GPUs (Wang et al., 2013) may turn MCMC into a competitive alternative for practical use in the future. In terms of accuracy, in our high-noise simulation scenario MCMC was slightly mo... |

624 |
The Monte Carlo method
- Metropolis, Ulam
(Show Context)
Citation Context ...sion focused on DCM, see Daunizeau et al., 2011). Commonly employed inversion methods are either very slow, e.g. Markov Chain Monte Carlo Methods (MCMC) such as the Metropolis–Hastings algorithm (MH; =-=Metropolis and Ulam, 1949-=-), or are susceptible to local minima, such as gradient descent schemes used in variational Bayesian methods (Friston et al., 2007; Daunizeau et al., 2014). Here, we consider Gaussian process optimisa... |

480 |
Efficient Global Optimization of Expensive Black-Box Functions
- Jones, Schonlau, et al.
- 1998
(Show Context)
Citation Context ...n optimisation algorithm based on GP has the following general structure described in Box 2: There are number of possible criteria for convergence, i.e. Expected Improvement (cf. Mockus et al., 1978; =-=Jones et al., 1998-=-). However, in this work, we primarily focus on the Upper Confidence Bound (UCB) criterion (Srinivas et al., 2012) due to both its simplicity and robustness in practical applications: UCB μ xjSð Þ; σ... |

416 |
Dynamic Causal Modeling.
- Friston, Harrison, et al.
- 2003
(Show Context)
Citation Context ...ge sev ie r .com/ locate /yn imginversion. The second group of models concerns the physiological processes underlying neuroimaging data. In particular, these are dynamic causal models (DCMs) of fMRI (=-=Friston et al., 2003-=-) or electrophysiological data (David et al., 2006). As generative models, DCMs embody a such as Gauss–Newton descent schemes are too vulnerable to localminima. This general framework, Bayesian Global... |

340 | Near-Optimal sensor placements in Gaussian Processes: Theory, efficient August 23, 2012 DRAFT algorithms and empirical - KRAUSE, SINGH, et al. |

199 | Temporal difference models and reward-related learning in the human brain.
- O’Doherty, Dayan, et al.
- 2003
(Show Context)
Citation Context ... variety of mathematical models to neuroimagthe encoding of different types of prediction errors and uncertainty in brain regions such as the dopaminergic midbrain (D'Ardenne et al., 2008), striatum (=-=O'Doherty et al., 2003-=-), or the basal forebrain (Iglesias et al., 2013), and cortical structures like the insula (Preuschoff et al., 2008). Over the years, themodels employed have become increasingly complex, from classica... |

165 | Nonlinear responses in fMRI: the balloon model, Volterra kernels, and other hemodynamics,”
- Friston, Mechelli, et al.
- 2000
(Show Context)
Citation Context ...the precision parameter β (see Eq. (4)) can exploit a priori knowledge about the target function,nal dynamics are translated into regionally specific BOLD signals through a hemodynamic forward model (=-=Friston et al., 2000-=-; Stephan et al., 2007). The DCM parameter set thus consists of three subsets: parameters of the neuronal state equations (“connectivity parameters”), parameters of the hemodynamic forward model (“hem... |

158 | Gaussian process dynamical models for human motion
- Wang, Fleet, et al.
- 2008
(Show Context)
Citation Context ...ter estimation (Fig. 7, Table 2). While a few previous studies have explored the use of Gaussian processes for identification of dynamic systems and hierarchical models (e.g., Ažman and Kocijan, 2011;=-=Wang et al., 2008-=-), the present work is, to our knowledge, novel in four ways. First, we present a simple but effective strategy for boosting computational performance of GPO by embedding a local gradient-based search... |

127 |
A statistical approach to some basic mine valuation problems on the witwatersrand,
- Krige
- 1951
(Show Context)
Citation Context ...rtance for the overall scheme in more detail below, it should be emphasised that using GP for BGO is well-established. For example, using GP mean estimates for approximation corresponds to “kriging” (=-=Krige, 1951-=-). By contrast, more recent approaches to efficient global optimisation exploit both mean and variance estimates provided by GP (Osborne et al., 2009). Gaussian processes This section provides a short... |

123 | Gaussian process optimization in the bandit setting: No regret and experimental design. Arxiv preprint arXiv:0912.3995
- Srinivas, Krause, et al.
- 2009
(Show Context)
Citation Context ...n-sampled points, it enables us to derive a number of criteria which guide the sampling from hitherto unexplored domains, such as Expected Improvement (Mockus et al., 1978) or Upper Confidence Bound (=-=Srinivas et al., 2010-=-). These criteria suggest a principled exploration-exploitation trade-off. An outline of the general BGO algorithm is provided in Box 1. Notably, the approximating function can be any regression funct... |

114 | Comparing dynamic causal models.
- Penny, Stephan, et al.
- 2004
(Show Context)
Citation Context ...An additional key goal is to obtain an approximation to the log-evidence in order to perform model selection, i.e., comparing alternative explanations of how the observed data could have been caused (=-=Penny et al., 2004-=-). Many of the computational and physiological models mentioned above, e.g. DCM and HGF, can be understood as special cases of hierarchical Bayesian models (HBMs). These models are powerful tools for ... |

109 | Variational free energy and the Laplace approximation.
- Friston, Mattout, et al.
- 2007
(Show Context)
Citation Context ...lo Methods (MCMC) such as the Metropolis–Hastings algorithm (MH; Metropolis and Ulam, 1949), or are susceptible to local minima, such as gradient descent schemes used in variational Bayesian methods (=-=Friston et al., 2007-=-; Daunizeau et al., 2014). Here, we consider Gaussian process optimisation (GPO; Osborne et al., 2009; Frean and Boyle, 2008) as an alternative to MCMC and variational methods. GPO offers three potent... |

108 |
Replica Monte Carlo simulation of spin glasses. Phys Rev Lett
- RH, JS
- 1986
(Show Context)
Citation Context ...(GPU). This toolbox includes an implementation of parallel tempering for DCM, an extended version of the Metropolis Hastings algorithm that simulates parallel but connected Markov Monte Carlo Chains (=-=Swendsen and Wang, 1986-=-; Laskey and Myers, 2003) and increases the statistical efficiency of sampling. Parallel tempering is particularly efficient when the posterior distribution is multimodal as shown by Calderhead and Gi... |

84 | Assessing convergence of Markov chain Monte Carlo algorithms
- Brooks, Roberts
- 1998
(Show Context)
Citation Context ...itialized chains (not dissimilar to a classical ANOVA) in order to verify whether the chains have resulted in non-distinguishable distributions (for a description of both diagnostics see pp. 319–335, =-=Brooks and Roberts, 1998-=-). Notably, in addition to different numerical strategies, an important difference between these optimisation schemes is the choice of objective function. That is, our GPO and MH implementations optim... |

70 | Dynamic causal modeling of evoked responses in
- David, Kiebel, et al.
- 2006
(Show Context)
Citation Context ...d group of models concerns the physiological processes underlying neuroimaging data. In particular, these are dynamic causal models (DCMs) of fMRI (Friston et al., 2003) or electrophysiological data (=-=David et al., 2006-=-). As generative models, DCMs embody a such as Gauss–Newton descent schemes are too vulnerable to localminima. This general framework, Bayesian Global Optimisation (BGO; Mockus et al., 1978), offers a... |

65 |
BOLD response reflecting dopaminergic signals in the human ventral tegmental area,
- D’Ardenne, McClure, et al.
- 2008
(Show Context)
Citation Context ...iswas accomplished by introducing a variety of mathematical models to neuroimagthe encoding of different types of prediction errors and uncertainty in brain regions such as the dopaminergic midbrain (=-=D'Ardenne et al., 2008-=-), striatum (O'Doherty et al., 2003), or the basal forebrain (Iglesias et al., 2013), and cortical structures like the insula (Preuschoff et al., 2008). Over the years, themodels employed have become ... |

64 |
Bayesian model selection for group studies.
- Stephan, Penny, et al.
- 2009
(Show Context)
Citation Context ... onmodel selection results, we computed the free energy for each model in each subject by applying a Laplace approximation to the log joint. We then performed random effects Bayesian model selection (=-=Stephan et al., 2009-=-) and found that while all three methods produced the same ranking ofFig. 8.Comparison of difference of log joint values across the subject (rows) andmodels (column difference between methods, red = s... |

62 |
On bayesian methods for seeking the extremum.
- Mockus
- 1974
(Show Context)
Citation Context ...gical data (David et al., 2006). As generative models, DCMs embody a such as Gauss–Newton descent schemes are too vulnerable to localminima. This general framework, Bayesian Global Optimisation (BGO; =-=Mockus et al., 1978-=-), offers a useful compromise between MH and local methods. The underlying idea of BGO is to approximate the target functionwith some easy-to-evaluate proxy based on a set of points over which the tar... |

58 |
Human insula activation reflects risk prediction errors as well as risk.
- Preuschoff, SR, et al.
- 2008
(Show Context)
Citation Context ...n brain regions such as the dopaminergic midbrain (D'Ardenne et al., 2008), striatum (O'Doherty et al., 2003), or the basal forebrain (Iglesias et al., 2013), and cortical structures like the insula (=-=Preuschoff et al., 2008-=-). Over the years, themodels employed have become increasingly complex, from classical reinforcement learningmodels such as the Rescorla-Wagner model (Rescorla and Wagner, 1972) or temporal dif-predic... |

45 |
Learning the value of information in an uncertain
- Behrens, Woolrich, et al.
- 2007
(Show Context)
Citation Context ...ng (for review, see Friston and Dolan, 2010). These models can be categorised into two groups. First, models of information processing ference learning (Schultz et al., 1997) to more co models (e.g., =-=Behrens et al., 2007-=-). One particulathe advent of fMRI in the early 1990s, a first decade of neuroimaging research focused on the problem of mapping, i.e., where particular cognitive functions are implemented, by localis... |

41 | Gaussian processes for global optimization.
- Osborne, Garnett, et al.
- 2009
(Show Context)
Citation Context ...usceptible to local minima, such as gradient descent schemes used in variational Bayesian methods (Friston et al., 2007; Daunizeau et al., 2014). Here, we consider Gaussian process optimisation (GPO; =-=Osborne et al., 2009-=-; Frean and Boyle, 2008) as an alternative to MCMC and variational methods. GPO offers three potential advantages: (i) as a global optimisation method for sufficiently smooth and efficiently evaluable... |

33 | Comparing hemodynamic models with DCM.
- Stephan, Weiskopf, et al.
- 2007
(Show Context)
Citation Context ...r β (see Eq. (4)) can exploit a priori knowledge about the target function,nal dynamics are translated into regionally specific BOLD signals through a hemodynamic forward model (Friston et al., 2000; =-=Stephan et al., 2007-=-). The DCM parameter set thus consists of three subsets: parameters of the neuronal state equations (“connectivity parameters”), parameters of the hemodynamic forward model (“hemodynamic parameters”) ... |

32 | Estimating Bayes factors via thermodynamic integration and population MCMC - Calderhead, Girolami |

31 |
Dynamic causal modelling: A critical review of the biophysical and statistical foundations,
- Daunizeau, David, et al.
- 2011
(Show Context)
Citation Context ...pread application. One potential problem is, however, that statistical inference can be difficult due to the computational challenges of model inversion (for an earlier discussion focused on DCM, see =-=Daunizeau et al., 2011-=-). Commonly employed inversion methods are either very slow, e.g. Markov Chain Monte Carlo Methods (MCMC) such as the Metropolis–Hastings algorithm (MH; Metropolis and Ulam, 1949), or are susceptible ... |

29 |
Quantitative prediction of subjective pain intensity from whole-brain fMRI data using Gaussian processes. Neuroimage 49
- Marquand, Howard, et al.
- 2010
(Show Context)
Citation Context ... technical issues, such as optimisation of hyperparameters and the challenge by the ‘curse of dimensionality’. While GPs have found application for classification analyses of neuroimaging data (e.g., =-=Marquand et al., 2010-=-; Salimi-Khorshidi et al., 2011; Mourão-Miranda et al., 2012; Pyka et al., 2013), they have found little application in computationalmodelling of brain physiology or cognition so far. We therefore dis... |

26 | Information-theoretic regret bounds for gaussian process optimization in the bandit setting.
- Srinivas, Krause, et al.
- 2012
(Show Context)
Citation Context ...of possible criteria for convergence, i.e. Expected Improvement (cf. Mockus et al., 1978; Jones et al., 1998). However, in this work, we primarily focus on the Upper Confidence Bound (UCB) criterion (=-=Srinivas et al., 2012-=-) due to both its simplicity and robustness in practical applications: UCB μ xjSð Þ; σ2 xjSð Þ μ xjSð Þ þ ασ2 xjSð Þ; ð7Þ whereα ≥ 0 enables control of the exploration–exploitation trade-off.... |

21 |
Dynamic causal modelling of evoked responses: the role of intrinsic connections.
- Kiebel, Garrido, et al.
- 2007
(Show Context)
Citation Context ...ceivable for local extrema to become a more serious problem when inverting DCMs with less smoothness and more pronounced non-convexity, such as DCMs for electrophysiological data (David et al., 2006; =-=Kiebel et al., 2007-=-; Chen et al., 2008; Moran et al., 2009; Marreiros et al., 2010). In futureTable 5 The table shows R2 for multivariate linear regression and negative free energy for Variational Bayesian Regression wh... |

18 | Dynamic causal modelling of induced responses
- Chen, Kiebel, et al.
- 2008
(Show Context)
Citation Context ...trema to become a more serious problem when inverting DCMs with less smoothness and more pronounced non-convexity, such as DCMs for electrophysiological data (David et al., 2006; Kiebel et al., 2007; =-=Chen et al., 2008-=-; Moran et al., 2009; Marreiros et al., 2010). In futureTable 5 The table shows R2 for multivariate linear regression and negative free energy for Variational Bayesian Regression when all model parame... |

17 |
Weakly supervised structured output learning for semantic segmentation. CVPR
- Vezhnevets, Ferrari, et al.
(Show Context)
Citation Context ...uation approach as described for DCM above, i.e., we generated synthetic data and tested the accuracy of HGF parameter estimation for different optimisation(of the adviser's intentions).et al., 2008; =-=Vezhnevets et al., 2012-=-; Osborne et al., 2009). However, in some of the problems mentioned above, the dimensionality of the parameter space can be quite high (e.g., DCMs for fMRI often have several dozen parameters). This p... |

16 |
Dynamic causal models of steady-state responses
- Moran, Stephan, et al.
- 2009
(Show Context)
Citation Context ...ore serious problem when inverting DCMs with less smoothness and more pronounced non-convexity, such as DCMs for electrophysiological data (David et al., 2006; Kiebel et al., 2007; Chen et al., 2008; =-=Moran et al., 2009-=-; Marreiros et al., 2010). In futureTable 5 The table shows R2 for multivariate linear regression and negative free energy for Variational Bayesian Regression when all model parameter estimates (excep... |

14 | Using Gaussian processes to optimize expensive functions
- Frean, Boyle
- 2008
(Show Context)
Citation Context ...nima, such as gradient descent schemes used in variational Bayesian methods (Friston et al., 2007; Daunizeau et al., 2014). Here, we consider Gaussian process optimisation (GPO; Osborne et al., 2009; =-=Frean and Boyle, 2008-=-) as an alternative to MCMC and variational methods. GPO offers three potential advantages: (i) as a global optimisation method for sufficiently smooth and efficiently evaluable objective functions it... |

14 |
A bayesian foundation for individual learning under uncertainty. Front Hum Neurosci
- Mathys, Daunizeau, et al.
- 2011
(Show Context)
Citation Context ...ounts of cognition which serve to infer, from the observed subjectspecific behaviour, trajectories of computational quantities such as ine in more detail below, the Hierarchical Gaussian Filter (HGF; =-=Mathys et al., 2011-=-), describes a hierarchy of coupled belief updating processes whose subject-specific parameters are inferred through variationaling (for review, see Friston and Dolan, 2010). These models can be categ... |

13 | Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data - Glascher, O'Doherty - 2010 |

11 | Pattern recognition and functional neuroimaging help to discriminate healthy adolescents at risk for mood disorders from low risk adolescents.
- Mourao-Miranda, Oliveira, et al.
- 2012
(Show Context)
Citation Context ...ters and the challenge by the ‘curse of dimensionality’. While GPs have found application for classification analyses of neuroimaging data (e.g., Marquand et al., 2010; Salimi-Khorshidi et al., 2011; =-=Mourão-Miranda et al., 2012-=-; Pyka et al., 2013), they have found little application in computationalmodelling of brain physiology or cognition so far. We therefore discuss some of their central properties in a tutorial-like fas... |

10 |
Computational and dynamic models in neuroimaging
- Friston, Dolan
(Show Context)
Citation Context ...rarchical Gaussian Filter (HGF; Mathys et al., 2011), describes a hierarchy of coupled belief updating processes whose subject-specific parameters are inferred through variationaling (for review, see =-=Friston and Dolan, 2010-=-). These models can be categorised into two groups. First, models of information processing ference learning (Schultz et al., 1997) to more co models (e.g., Behrens et al., 2007). One particulathe adv... |

9 |
A dynamic causal model study of neuronal population dynamics
- Marreiros, Kiebel, et al.
- 2010
(Show Context)
Citation Context ...when inverting DCMs with less smoothness and more pronounced non-convexity, such as DCMs for electrophysiological data (David et al., 2006; Kiebel et al., 2007; Chen et al., 2008; Moran et al., 2009; =-=Marreiros et al., 2010-=-). In futureTable 5 The table shows R2 for multivariate linear regression and negative free energy for Variational Bayesian Regression when all model parameter estimates (except for κ) are used to pre... |

5 | Exploring an adaptive Metropolis algorithm. - Shaby, Wells - 2011 |

3 |
A Metropolis-Hastings algorithm for dynamic causal models
- Chumbley, Friston, et al.
- 2007
(Show Context)
Citation Context ... and then challenged the different inversion methods to recover the known parameter values. This complements previous analyses of empirical fMRI data which compared VB and MCMC for inversion of DCMs (=-=Chumbley et al., 2007-=-). By contrast, for the HGF, simulation studies of model inversion already exist (albeit based on a simpler HGF than the one used here;Mathys et al., 2014) andwe turned to empirical data. Clearly, her... |

3 | VBA: A probabilistic treatment of nonlinear models for neurobiological and behavioural data. PLoS Comput. Biol
- Daunizeau, Adam, et al.
(Show Context)
Citation Context ... as the Metropolis–Hastings algorithm (MH; Metropolis and Ulam, 1949), or are susceptible to local minima, such as gradient descent schemes used in variational Bayesian methods (Friston et al., 2007; =-=Daunizeau et al., 2014-=-). Here, we consider Gaussian process optimisation (GPO; Osborne et al., 2009; Frean and Boyle, 2008) as an alternative to MCMC and variational methods. GPO offers three potential advantages: (i) as a... |

3 |
Population Markov Chain Monte
- Laskey, Myers
- 2003
(Show Context)
Citation Context ...des an implementation of parallel tempering for DCM, an extended version of the Metropolis Hastings algorithm that simulates parallel but connected Markov Monte Carlo Chains (Swendsen and Wang, 1986; =-=Laskey and Myers, 2003-=-) and increases the statistical efficiency of sampling. Parallel tempering is particularly efficient when the posterior distribution is multimodal as shown by Calderhead and Girolami (2009). During th... |

2 | Inferring on the intentions of others by hierarchical bayesian learning. PLoS Comput. Biol. 10:e1003810. doi: 10.1371/journal.pcbi.1003810
- Diaconescu, O, et al.
- 2014
(Show Context)
Citation Context ... convex objective function is likely to hold. Present applications of the hierarchical Bayesian model of cognitionmove to increasinglymore complexmodelswithmore complicated objective functions (e.g., =-=Diaconescu et al., 2014-=-). This increase in complexity raises the question whether the equivalence of optimisation methods still holds. Here,we examine this issue, using a set of nine differentHGFs, someof which are consider... |

2 |
Baseline Activity Predicts Working Memory Load of Preceding Task Condition. HBM. http://dx. doi.org/10.1002/hbm.22121
- Pyka, Hahn, et al.
- 2012
(Show Context)
Citation Context ... ‘curse of dimensionality’. While GPs have found application for classification analyses of neuroimaging data (e.g., Marquand et al., 2010; Salimi-Khorshidi et al., 2011; Mourão-Miranda et al., 2012; =-=Pyka et al., 2013-=-), they have found little application in computationalmodelling of brain physiology or cognition so far. We therefore discuss some of their central properties in a tutorial-like fashion. In the “Resul... |

2 |
Using Gaussianprocess regression for meta-analytic neuroimaging inference based on sparse observations
- Salimi-Khorshidi, Nichols, et al.
- 2011
(Show Context)
Citation Context ... as optimisation of hyperparameters and the challenge by the ‘curse of dimensionality’. While GPs have found application for classification analyses of neuroimaging data (e.g., Marquand et al., 2010; =-=Salimi-Khorshidi et al., 2011-=-; Mourão-Miranda et al., 2012; Pyka et al., 2013), they have found little application in computationalmodelling of brain physiology or cognition so far. We therefore discuss some of their central prop... |

1 | mpdcm: A toolbox for Massively Parallel Dynamic Causal Modeling (in preparation - Aponte, Raman, et al. |

1 |
Dynamical systems identification using Gaussian process models with incorporated local models
- Ažman, Kocijan
- 2011
(Show Context)
Citation Context ...rate than GPO for parameter estimation (Fig. 7, Table 2). While a few previous studies have explored the use of Gaussian processes for identification of dynamic systems and hierarchical models (e.g., =-=Ažman and Kocijan, 2011-=-;Wang et al., 2008), the present work is, to our knowledge, novel in four ways. First, we present a simple but effective strategy for boosting computational performance of GPO by embedding a local gra... |

1 | Automatic Model Construction with Gaussian Processes - Duvenaud - 2014 |