## Measure Transformer Semantics for Bayesian Machine Learning

### Cached

### Download Links

Citations: | 4 - 2 self |

### BibTeX

@MISC{Borgström_measuretransformer,

author = {Johannes Borgström and Andrew D. Gordon and Michael Greenberg and James Margetson and Jurgen Van Gael},

title = {Measure Transformer Semantics for Bayesian Machine Learning},

year = {}

}

### OpenURL

### Abstract

Abstract. The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models. 1

### Citations

1255 | Factor graphs and the sum-product algorithm
- Kschischang, Frey, et al.
(Show Context)
Citation Context ...rative language with an informal probabilistic semantics. Csoft is the native language of Infer.NET [25], a software library for Bayesian reasoning. A compiler turns Csoft programs into factor graphs =-=[18]-=-, data structures that support efficient inference algorithms [15]. This paper borrows ideas from Csoft and extends them, placing the semantics on a firm footing. Bayesian Models as Probabilistic Expr... |

341 | Reconciling two views of cryptography: The computational soundness of formal encryption
- Abadi, Rogaway
- 1872
(Show Context)
Citation Context ... with formal semantics find application in many areas apart from machine learning, including databases [6], model checking [19], differential privacy [24, 34], information flow [20], and cryptography =-=[1]-=-. A recent monograph on semantics for labelled Markov processes [29] focuses on bisimulation-based equational reasoning. The syntax and semantics of Imp is modelled on an existing probabilistic langua... |

328 |
Expectation propagation for approximate Bayesian inference. UAI
- Minka
- 2001
(Show Context)
Citation Context ...s as a collection of multiplicative factors. Factor graphs are an effective means of stating conditional independence properties between variables, and enable efficient algebraic inference techniques =-=[27, 38]-=- as well as sampling techniques [15, Chapter 12]. We use factor graphs with gates [26] for modelling if-then-else clauses; gates introduce second-order edges in the graph. Factor Graphs: G ::= new x :... |

140 |
Semantics of probabilistic programs
- Kozen
- 1981
(Show Context)
Citation Context ...ed with recursive computation. One of the first semantics is for Probabilistic LCF [35], which augments the core functional language LCF with weighted binary choice, for discrete distributions. Kozen =-=[17]-=- develops a probabilistic semantics for while-programs augmented with random assignment. He develops two provably equivalent semantics; one more operational, and the other a denotational semantics usi... |

87 | Variational message passing
- Winn, Bishop
- 2005
(Show Context)
Citation Context ...s as a collection of multiplicative factors. Factor graphs are an effective means of stating conditional independence properties between variables, and enable efficient algebraic inference techniques =-=[27, 38]-=- as well as sampling techniques [15, Chapter 12]. We use factor graphs with gates [26] for modelling if-then-else clauses; gates introduce second-order edges in the graph. Factor Graphs: G ::= new x :... |

82 | Quantifying information flow
- Lowe
- 2002
(Show Context)
Citation Context ...Probabilistic languages with formal semantics find application in many areas apart from machine learning, including databases [6], model checking [19], differential privacy [24, 34], information flow =-=[20]-=-, and cryptography [1]. A recent monograph on semantics for labelled Markov processes [29] focuses on bisimulation-based equational reasoning. The syntax and semantics of Imp is modelled on an existin... |

71 |
Abstraction, refinement and proof for probabilistic systems. Monographs in computer science
- McIver, Morgan
- 2005
(Show Context)
Citation Context ...efficient implementation via factor graphs, which led us to work directly with standard distributions and to have a semantics of observation that is independent of the program text. McIver and Morgan =-=[23]-=- develop a theory of abstraction and refinement for probabilistic while programs, based on weakest preconditions. They reject a subdistribution transformer semantics in order to admit demonic nondeter... |

71 |
Privacy integrated queries: an extensible platform for privacy-preserving data analysis
- McSherry
- 2009
(Show Context)
Citation Context ...of Probabilistic Languages Probabilistic languages with formal semantics find application in many areas apart from machine learning, including databases [6], model checking [19], differential privacy =-=[24, 34]-=-, information flow [20], and cryptography [1]. A recent monograph on semantics for labelled Markov processes [29] focuses on bisimulation-based equational reasoning. The syntax and semantics of Imp is... |

62 | Church: A language for generative models
- Goodman, Mansinghka, et al.
- 2008
(Show Context)
Citation Context ...e art for learning from data. The theme of this paper is the idea of writing Bayesian models as probabilistic programs, which was pioneered by Koller et al. [16] and is recently gaining in popularity =-=[31, 30, 9, 4, 14]-=-. In particular, we draw inspiration from Csoft [37], an imperative language with an informal probabilistic semantics. Csoft is the native language of Infer.NET [25], a software library for Bayesian r... |

62 | IBAL: A probabilistic rational programming language
- Pfeffer
- 2001
(Show Context)
Citation Context ...e art for learning from data. The theme of this paper is the idea of writing Bayesian models as probabilistic programs, which was pioneered by Koller et al. [16] and is recently gaining in popularity =-=[31, 30, 9, 4, 14]-=-. In particular, we draw inspiration from Csoft [37], an imperative language with an informal probabilistic semantics. Csoft is the native language of Infer.NET [25], a software library for Bayesian r... |

62 | Stochastic Lambda Calculus and Monads of Probability Distributions
- Ramsey, Pfeffer
- 2002
(Show Context)
Citation Context ...ther hand, observations are not present in Kozen’s language. Jones and Plotkin [13] investigate the probability monad, and apply it to languages with discrete probabilistic choice. Ramsey and Pfeffer =-=[33]-=- give a stochastic λ-calculus with a measure-theoretic semantics in the probability monad, and provide an embedding within Haskell; they do not consider observations. We can generalize the semantics o... |

56 | Formal certification of codebased cryptographic proofs
- Barthe, Grégoire, et al.
- 2009
(Show Context)
Citation Context ...ver assignments to the term’s free variables to a joint measure of the term’s return value and assignments to its free variables. This choice is a generalization of the (discrete) semantics of pWHILE =-=[2]-=-. First, we define a data structure for an evaluation environment assigning values to variable names, and corresponding operations. Given an environment Γ = x1:t1,...,xn:tn, we let S〈Γ 〉 be the set of... |

49 | Effective Bayesian inference for stochastic programs
- Koller, McAllester, et al.
- 1997
(Show Context)
Citation Context ...he Bayesian paradigm is now the state of the art for learning from data. The theme of this paper is the idea of writing Bayesian models as probabilistic programs, which was pioneered by Koller et al. =-=[16]-=- and is recently gaining in popularity [31, 30, 9, 4, 14]. In particular, we draw inspiration from Csoft [37], an imperative language with an informal probabilistic semantics. Csoft is the native lang... |

47 | FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs
- McCallum, Schultz, et al.
- 2009
(Show Context)
Citation Context ...rams to Csoft programs, which are evaluated by Infer.NET by constructing suitable factor graphs. The implementation advantage of translating F# to Csoft, over simply generating factor graphs directly =-=[22]-=-, is that the translation preserves the structure of the input model (including array processing in our full language), which can be exploited by the various inference algorithms supported by Infer.NE... |

36 |
Graphical Models
- Koller, Friedman
- 2009
(Show Context)
Citation Context ... the native language of Infer.NET [25], a software library for Bayesian reasoning. A compiler turns Csoft programs into factor graphs [18], data structures that support efficient inference algorithms =-=[15]-=-. This paper borrows ideas from Csoft and extends them, placing the semantics on a firm footing. Bayesian Models as Probabilistic Expressions Consider a simplified form of TrueSkill [11], a large-scal... |

34 | Probabilistic databases: diamonds in the dirt
- Dalvi, Ré, et al.
- 2009
(Show Context)
Citation Context ... suitable for Monte Carlo analysis. Other Uses of Probabilistic Languages Probabilistic languages with formal semantics find application in many areas apart from machine learning, including databases =-=[6]-=-, model checking [19], differential privacy [24, 34], information flow [20], and cryptography [1]. A recent monograph on semantics for labelled Markov processes [29] focuses on bisimulation-based equa... |

30 | Stochastic processes as concurrent constraint programs
- Gupta, Jagadeesan, et al.
- 1999
(Show Context)
Citation Context ...ded semantics is that x is constrained to be zero with probability 1 (so in particular µ(R) = 1). The probabilistic concurrent constraint programming language pcc of Gupta, Jagadeesan, and Panangaden =-=[10]-=- is also intended for describing probability distributions using independent sampling and constraints. Our use of observations corresponds to constraints on random variables in pcc. In the finite case... |

29 | A probabilistic language based upon sampling functions
- Park, Pfenning, et al.
- 2005
(Show Context)
Citation Context ...e art for learning from data. The theme of this paper is the idea of writing Bayesian models as probabilistic programs, which was pioneered by Koller et al. [16] and is recently gaining in popularity =-=[31, 30, 9, 4, 14]-=-. In particular, we draw inspiration from Csoft [37], an imperative language with an informal probabilistic semantics. Csoft is the native language of Infer.NET [25], a software library for Bayesian r... |

22 |
D.: Quantitative Analysis with the Probabilistic Model Checker PRISM
- Kwiatkowska, Norman, et al.
- 2005
(Show Context)
Citation Context ...Carlo analysis. Other Uses of Probabilistic Languages Probabilistic languages with formal semantics find application in many areas apart from machine learning, including databases [6], model checking =-=[19]-=-, differential privacy [24, 34], information flow [20], and cryptography [1]. A recent monograph on semantics for labelled Markov processes [29] focuses on bisimulation-based equational reasoning. The... |

20 | Distance makes the types grow stronger: A calculus for differential privacy
- Reed, Pierce
(Show Context)
Citation Context ...of Probabilistic Languages Probabilistic languages with formal semantics find application in many areas apart from machine learning, including databases [6], model checking [19], differential privacy =-=[24, 34]-=-, information flow [20], and cryptography [1]. A recent monograph on semantics for labelled Markov processes [29] focuses on bisimulation-based equational reasoning. The syntax and semantics of Imp is... |

19 |
TrueskillTM : A Bayesian skill rating system
- Herbrich, Minka, et al.
- 2006
(Show Context)
Citation Context ...ce algorithms [15]. This paper borrows ideas from Csoft and extends them, placing the semantics on a firm footing. Bayesian Models as Probabilistic Expressions Consider a simplified form of TrueSkill =-=[11]-=-, a large-scale online system for ranking computer gamers. There is a population of players, each assumed to have a skill, which is a real number that cannot be directly observed. We observe skills on... |

19 |
Bayesian Modeling using WinBUGS
- Ntzoufras
- 2009
(Show Context)
Citation Context ... methods. Blaise [4] supports the compositional construction of sophisticated probabilistic models, and decouples the choice of inference algorithm from the specification of the distribution. WinBUGS =-=[28]-=- is a popular language for explicitly describing distributions suitable for Monte Carlo analysis. Other Uses of Probabilistic Languages Probabilistic languages with formal semantics find application i... |

17 |
Labelled Markov Processes
- Panangaden
- 2009
(Show Context)
Citation Context ...chine learning, including databases [6], model checking [19], differential privacy [24, 34], information flow [20], and cryptography [1]. A recent monograph on semantics for labelled Markov processes =-=[29]-=- focuses on bisimulation-based equational reasoning. The syntax and semantics of Imp is modelled on an existing probabilistic language [2] without observations. Erwig and Kollmansberger [7] describe a... |

12 |
Composable Probabilistic Inference with Blaise
- Bonawitz
- 2008
(Show Context)
Citation Context |

11 |
Probabilistic LCF
- Saheb-Djahromi
- 1978
(Show Context)
Citation Context ...guages There is a long history of formal semantics for probabilistic languages with sampling primitives, often combined with recursive computation. One of the first semantics is for Probabilistic LCF =-=[35]-=-, which augments the core functional language LCF with weighted binary choice, for discrete distributions. Kozen [17] develops a probabilistic semantics for while-programs augmented with random assign... |

6 |
Functional pearls: Probabilistic functional programming in
- Erwig, Kollmansberger
- 2006
(Show Context)
Citation Context ...processes [29] focuses on bisimulation-based equational reasoning. The syntax and semantics of Imp is modelled on an existing probabilistic language [2] without observations. Erwig and Kollmansberger =-=[7]-=- describe a library for probabilistic functional programming in Haskell. The library is based on the probability monad, and uses a finite representation suitable for small discrete distributions; the ... |

5 | Monolingual probabilistic programming using generalized coroutines
- Kiselyov, Shan
- 2009
(Show Context)
Citation Context |

4 |
Probability Theory: The Logic of Science, chapter 15.7 The Borel-Kolmogorov paradox
- Jaynes
- 2003
(Show Context)
Citation Context ...he reason for not instead writing x=y is that conditioning on events of zero probability without specifying the random variable they are drawn from is not in general well-defined, cf. Borel’s paradox =-=[12]-=-. To avoid this issue, we instead observe the random variable x−y of type real, at the value 0. To give a formal semantics to such observations, as well as to mixtures of continuous and discrete distr... |

3 |
On the definition of probability densities and sufficiency of the likelihood map
- Fraser, McDunnough, et al.
- 1995
(Show Context)
Citation Context .... Given a measure µ on T[[t]] and a measurable function p : t → b, we consider the family of events p(x) = c where c ranges over Vb. We define ˙µ[A||p = c] ∈ R (the µ-density at p = c of A) following =-=[8]-=-, by: Conditional Density: ˙µ[A||p = c] ˙µ[A||p = c] � limi→∞ µ(A ∩ p −1 (Bi))/ ∫ Bi 1dλ if the limit exists and is the same for all sequences {Bi} of closed sets converging regularly to c. Where defi... |

2 |
Software available from http://research.microsoft.com/infernet
- Minka, Winn, et al.
- 2009
(Show Context)
Citation Context ...aining in popularity [31, 30, 9, 4, 14]. In particular, we draw inspiration from Csoft [37], an imperative language with an informal probabilistic semantics. Csoft is the native language of Infer.NET =-=[25]-=-, a software library for Bayesian reasoning. A compiler turns Csoft programs into factor graphs [18], data structures that support efficient inference algorithms [15]. This paper borrows ideas from Cs... |

2 |
Statistical Relational Learning, chapter The design and implementation of IBAL: A General-Purpose Probabilistic Language
- Pfeffer
- 2007
(Show Context)
Citation Context ...s [18, 15] are an efficient alternative to Monte Carlo sampling. To the best of our knowledge, all prior inference techniques for probabilistic languages, apart from Csoft and recent versions of IBAL =-=[32]-=-, are based on nondeterministic inference using some form of Monte Carlo sampling. The benefit of using factor graphs in Csoft is to support deterministic but approximative inference algorithms, which... |

2 |
Probabilistic programming with Infer.NET. Machine Learning Summer School lecture notes, available at http://research.microsoft.com/~minka
- Winn, Minka
- 2009
(Show Context)
Citation Context ...of writing Bayesian models as probabilistic programs, which was pioneered by Koller et al. [16] and is recently gaining in popularity [31, 30, 9, 4, 14]. In particular, we draw inspiration from Csoft =-=[37]-=-, an imperative language with an informal probabilistic semantics. Csoft is the native language of Infer.NET [25], a software library for Bayesian reasoning. A compiler turns Csoft programs into facto... |