## Combining Probability Forecasts (2008)

### Cached

### Download Links

Citations: | 4 - 0 self |

### BibTeX

@MISC{Ranjan08combiningprobability,

author = {Roopesh Ranjan and Tilmann Gneiting},

title = {Combining Probability Forecasts},

year = {2008}

}

### OpenURL

### Abstract

Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even in the ideal case in which the individual forecasts are calibrated. Toward this end, we propose a beta transformed linear opinion pool (BLP) for the aggregation of probability forecasts from distinct, calibrated or uncalibrated sources. The BLP method fits an optimal nonlinearly recalibrated forecast combination, by compositing a beta transform and the traditional linear opinion pool. The technique is illustrated in a simulation example and in a case study on statistical and National Weather Service probability of precipitation forecasts.

### Citations

214 | Combining Probability Distributions: A Critique and Annotated - Genest, Zidek - 1986 |

200 | Calibration of probabilities: The state of the art to 1980 - Lichtenstein, Fischhoff, et al. - 1982 |

177 | Strictly proper scoring rules, prediction and estimation
- Gneiting, Raftery
- 2007
(Show Context)
Citation Context ... the Brier or quadratic score (Brier 1950; Selten 1998) and the logarithmic score (Good 1952) provide summary measures of predictive performance that address calibration and sharpness simultaneously (=-=Gneiting and Raftery 2007-=-). 1 It is therefore critical that probability assessments are aggregated in ways that promote calibrated and sharp combined forecasts. In Section 2 we demonstrate a striking result, in that any weigh... |

170 |
Rational decisions
- Good
- 1952
(Show Context)
Citation Context ...e closer to the most confident values of zero or one, the sharper the forecast. Strictly proper scoring rules such as the Brier or quadratic score (Brier 1950; Selten 1998) and the logarithmic score (=-=Good 1952-=-) provide summary measures of predictive performance that address calibration and sharpness simultaneously (Gneiting and Raftery 2007). 1 It is therefore critical that probability assessments are aggr... |

164 | Bayesian model averaging: a tutorial
- Hoeting, Madigan, et al.
- 1999
(Show Context)
Citation Context ...perts’ or models’ strengths result in improved predictive performance. This is very much in the spirit of model averaging, which has primarily been developed for the purpose of statistical inference (=-=Hoeting et al. 1999-=-). Various ways of combining probability forecasts into a single aggregated forecast have been proposed. Genest and Zidek (1986), Wallsten et al. (1997), Clemen and Winkler (1999, 2007) and Primo et a... |

114 |
The Statistical Evaluation of Medical Tests for Classification and Prediction
- Pepe
- 2003
(Show Context)
Citation Context ...probability variables since 1968 (Croushore 1993). Of course, there are many other important applications of probability forecasts, including but not limited to medical diagnosis (Wilson et al. 1998; =-=Pepe 2003-=-), educational 1testing, and political and socio-economic foresight (Tetlock 2005). Arguably, a far-reaching transdisciplinary transition to distributional forecasting is well under way (Gneiting 200... |

100 |
The use of model output statistics (MOS) in objective weather forecasting
- Glahn, Lowry
- 1972
(Show Context)
Citation Context ... NWS forecast. The MOS probability forecasts are statistical forecasts that apply logistic regression techniques to the output of a numerical weather prediction model and recent weather observations (=-=Glahn and Lowry 1972-=-; Wilks 2006). 12GMOS EMOS Observed relative frequency 0.0 0.2 0.4 0.6 0.8 1.0 Observed relative frequency 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Forecast probability 0.0 0.2 0.4 0.6 0.8 1.0... |

87 |
The well-calibrated Bayesian
- Dawid
- 1982
(Show Context)
Citation Context ...nd can be traced to Murphy and Winkler (1987) and Schervish (1989). It differs from the game-theoretic approach to calibration that has been developed in a far-reaching, related strand of literature (=-=Dawid 1982-=-; Foster and Vohra 1998; Lehrer 2001; Sandroni, Smorodinsky and Vohra 2003; Vovk and Shafer 2005; Al-Najjar and Weinstein 2008; Feinberg and Stewart 2008). 3This latter property can be thought of as ... |

85 |
A general framework for forecast verification
- Murphy, Winkler
- 1987
(Show Context)
Citation Context ...ogy (Ariely et al. 2000), and medical diagnosis (Winkler and Poses 1993), among other fields. The goal in probability forecasting is to maximize the sharpness of the forecasts subject to calibration (=-=Murphy and Winkler 1987-=-; Gneiting, Balabdaoui and Raftery 2007). Calibration or reliability measures how close conditional event frequencies are to the forecast probabilities. Sharpness describes how far away the forecasts ... |

83 | Introducing: The Survey of Professional Forecasters
- Croushore
- 1993
(Show Context)
Citation Context ...the most influential and important event in their development (Murphy 1998; Winkler and Jose 2008). In economics, the Survey of Professional Forecasters has included probability variables since 1968 (=-=Croushore 1993-=-). Of course, there are many other important applications of probability forecasts, including but not limited to medical diagnosis (Wilson et al. 1998; Pepe 2003), educational 1testing, and political... |

74 |
A new vector partition of the probability score
- Murphy
- 1973
(Show Context)
Citation Context ...rence between the CP forecast and the fitted BLP model is 0.0215. Table 2 shows the mean Brier or quadratic score and its reliability, resolution and uncertainty components for the various forecasts (=-=Murphy 1973-=-; Dawid 1986). Suppose that the probability forecasts pt for the binary event yt, where t = 1, . . .,n, take discrete values fi ∈ [0, 1], where i = 1, . . ., I. Let ni be the number of times that the ... |

47 | Probabilistic forecasts, calibration and sharpness - Gneiting, Balabdaoui, et al. - 2005 |

40 | Prediction of coronary heart disease using risk factor categories. Circulation 97 - Wilson - 1998 |

37 | A general method for comparing probability assessors - Schervish - 1989 |

34 |
Verification of Forecasts Expressed
- Brier
- 1950
(Show Context)
Citation Context ... extreme the forecast probabilities are, that is, the closer to the most confident values of zero or one, the sharper the forecast. Strictly proper scoring rules such as the Brier or quadratic score (=-=Brier 1950-=-; Selten 1998) and the logarithmic score (Good 1952) provide summary measures of predictive performance that address calibration and sharpness simultaneously (Gneiting and Raftery 2007). 1 It is there... |

33 | 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging.Mon
- Sloughter, Raftery, et al.
(Show Context)
Citation Context ...cky and Fritsch (1995), Ariely et al. (2000), Wallsten and Diederich (2001) and Johnson et al. (2001). Despite their ubiquity, these issues have frequently been overlooked, with some of our own work (=-=Sloughter et al. 2007-=-) being one such example. 8 With a view toward applied forecasting problems, we recommend a transition from the traditional linear opinion pool to the nonlinearly recalibrated, beta-transformed linear... |

31 | Calibration with many checking rules - Sandroni, Smorodinsky, et al. - 2003 |

30 | Economic Forecasting - Elliot, Timmerman - 2008 |

26 | Scoring rules and the evaluation of probabilities - Winkler - 1996 |

25 |
On subjective probability forecasting
- Sanders
- 1963
(Show Context)
Citation Context ... forecasts, which is often referred to as a linear opinion pool. Substantial empirical evidence attests to the benefits of linear opinion pools, with successful applications ranging from meteorology (=-=Sanders 1963-=-; Vislocky and Fritsch 1995; Baars and Mass 2005) to economics (Graham 1996), psychology (Ariely et al. 2000), and medical diagnosis (Winkler and Poses 1993), among other fields. The goal in probabili... |

25 | 2005), “Good Randomized Sequential Probability Forecasting is Always Possible
- Vovk, Shafer
(Show Context)
Citation Context ...e game-theoretic approach to calibration that has been developed in a far-reaching, related strand of literature (Dawid 1982; Foster and Vohra 1998; Lehrer 2001; Sandroni, Smorodinsky and Vohra 2003; =-=Vovk and Shafer 2005-=-; Al-Najjar and Weinstein 2008; Feinberg and Stewart 2008). 3This latter property can be thought of as a weak form of calibration, and we refer to it as marginal consistency. It resembles the notion ... |

23 | An Introduction to Copulas, 2nd Edition - Nelsen - 2006 |

20 | The effects of averaging subjective probability estimates between and within judges
- Ariely, Au, et al.
- 2000
(Show Context)
Citation Context ...s to the benefits of linear opinion pools, with successful applications ranging from meteorology (Sanders 1963; Vislocky and Fritsch 1995; Baars and Mass 2005) to economics (Graham 1996), psychology (=-=Ariely et al. 2000-=-), and medical diagnosis (Winkler and Poses 1993), among other fields. The goal in probability forecasting is to maximize the sharpness of the forecasts subject to calibration (Murphy and Winkler 1987... |

20 | Evaluating and combining subjective probability estimates - Wallsten, Budescu, et al. - 1997 |

18 | Opportunities and priorities in a new era for weather and climate services - Dutton - 2002 |

15 |
Editorial: Probabilistic forecasting
- Gneiting
(Show Context)
Citation Context ...8; Pepe 2003), educational 1testing, and political and socio-economic foresight (Tetlock 2005). Arguably, a far-reaching transdisciplinary transition to distributional forecasting is well under way (=-=Gneiting 2008-=-). In many instances, multiple probability forecasts for the same event are available. In surveys, economic experts might provide diverse probability assessments of a future recession. Distinct numeri... |

14 | 2008): “Comparative Testing of Experts
- Al-Najjar, Weinstein
(Show Context)
Citation Context ...ach to calibration that has been developed in a far-reaching, related strand of literature (Dawid 1982; Foster and Vohra 1998; Lehrer 2001; Sandroni, Smorodinsky and Vohra 2003; Vovk and Shafer 2005; =-=Al-Najjar and Weinstein 2008-=-; Feinberg and Stewart 2008). 3This latter property can be thought of as a weak form of calibration, and we refer to it as marginal consistency. It resembles the notion of marginal calibration for pr... |

14 |
Is a Group of Economists Better than One? Than None
- Graham
- 1996
(Show Context)
Citation Context ... empirical evidence attests to the benefits of linear opinion pools, with successful applications ranging from meteorology (Sanders 1963; Vislocky and Fritsch 1995; Baars and Mass 2005) to economics (=-=Graham 1996-=-), psychology (Ariely et al. 2000), and medical diagnosis (Winkler and Poses 1993), among other fields. The goal in probability forecasting is to maximize the sharpness of the forecasts subject to cal... |

13 |
Axiomatic Characterization of the Quadratic Scoring
- Selten
- 1998
(Show Context)
Citation Context ... forecast probabilities are, that is, the closer to the most confident values of zero or one, the sharper the forecast. Strictly proper scoring rules such as the Brier or quadratic score (Brier 1950; =-=Selten 1998-=-) and the logarithmic score (Good 1952) provide summary measures of predictive performance that address calibration and sharpness simultaneously (Gneiting and Raftery 2007). 1 It is therefore critical... |

11 |
Improved model output statistics forecasts through model consensus
- Vislocky, Fritsch
- 1995
(Show Context)
Citation Context ...ich is often referred to as a linear opinion pool. Substantial empirical evidence attests to the benefits of linear opinion pools, with successful applications ranging from meteorology (Sanders 1963; =-=Vislocky and Fritsch 1995-=-; Baars and Mass 2005) to economics (Graham 1996), psychology (Ariely et al. 2000), and medical diagnosis (Winkler and Poses 1993), among other fields. The goal in probability forecasting is to maximi... |

10 | 2005: The performance of National Weather Service forecasts compared to operational, consensus, and weighted model output statistics - Mass, Baars - 2005 |

10 | Testing multiple forecasters
- Feinberg, Stewart
- 2008
(Show Context)
Citation Context ...en developed in a far-reaching, related strand of literature (Dawid 1982; Foster and Vohra 1998; Lehrer 2001; Sandroni, Smorodinsky and Vohra 2003; Vovk and Shafer 2005; Al-Najjar and Weinstein 2008; =-=Feinberg and Stewart 2008-=-). 3This latter property can be thought of as a weak form of calibration, and we refer to it as marginal consistency. It resembles the notion of marginal calibration for probabilistic forecasts of co... |

7 |
The ‘heuristics and biases’ bias in expert elicitation
- Kynn
- 2008
(Show Context)
Citation Context ...hich (7) can be enforced by requiring that α = β ≥ 1. (8) If we aim to address the hard-easy effect that has been described in the psychological literature (Lichtenstein, Fischhoff and Phillips 1982; =-=Kynn 2008-=-, p. 253) the fixed point in (7) can be taken to be x0 = 3 4 . We now describe how we go about parameter estimation for the BLP model in (6). Suppose that y1, . . ., yn are binary observations in the ... |

6 | Calibrating and combining precipitation probability forecasts - Clemen, Winkler - 1987 |

6 | Assessing probability assessors: Calibration and refinement - DeGroot, Fienberg - 1982 |

6 |
Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds
- Gneiting, Stanberry, et al.
- 2008
(Show Context)
Citation Context ...tional event frequencies are to the forecast probabilities. Sharpness describes how far away the forecasts are from the naive, climatological baseline forecast, that is, the marginal event frequency (=-=Gneiting et al. 2008-=-; Winkler and Jose 2008). The more extreme the forecast probabilities are, that is, the closer to the most confident values of zero or one, the sharper the forecast. Strictly proper scoring rules such... |

6 | Expert political judgement: How good is it? how can we know - Tetlock - 2005 |

6 |
2006a: Statistical Methods in the Atmospheric Sciences (2 nd Edition
- Wilks
(Show Context)
Citation Context ... probability forecasts are statistical forecasts that apply logistic regression techniques to the output of a numerical weather prediction model and recent weather observations (Glahn and Lowry 1972; =-=Wilks 2006-=-). 12GMOS EMOS Observed relative frequency 0.0 0.2 0.4 0.6 0.8 1.0 Observed relative frequency 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Forecast probability 0.0 0.2 0.4 0.6 0.8 1.0 Forecast pr... |

5 | Increasing the reliability of reliability diagrams - Bröcker, LA - 2007 |

4 | The improvement of probability judgements - Lindley - 1982 |

3 | Modeling expert judgements for bayesian updating - Genest, Schervish - 1985 |

3 | 2009): “Combination and calibration methods for probabilistic forecasts of binary events - Primo, Ferro, et al. |

3 | Understanding Pooled Subjective Probability Estimates - Wallsten, Diederich - 2001 |

2 |
Verification: forecast verification utilities. R package version 1.20. R Development Core Team (2007). R: A language and environment for statistical computing
- Pocernich
(Show Context)
Citation Context ... sample of size 10,000 from the joint distribution of Y , p1 and p2 and generate the combined ELP and OLP forecasts. Figure 1 shows empirical calibration curves or reliability diagrams (Sanders 1963; =-=Pocernich 2008-=-) for the four types of forecasts, which plot the conditional empirical event frequency versus the forecast probability. The red circles show the conditional empirical frequency; the broken lines give... |

1 | 2004): “Probability Judgements for Continuous Quantities - Hora |

1 | Averaging Proba22 Judgements: Monte Carlo Analyses of Asymptotic Diagnostic Value - Johnson, Budescu, et al. - 2001 |

1 |
Every Inspection is Manipulaple
- Lehrer
- 2001
(Show Context)
Citation Context ...ler (1987) and Schervish (1989). It differs from the game-theoretic approach to calibration that has been developed in a far-reaching, related strand of literature (Dawid 1982; Foster and Vohra 1998; =-=Lehrer 2001-=-; Sandroni, Smorodinsky and Vohra 2003; Vovk and Shafer 2005; Al-Najjar and Weinstein 2008; Feinberg and Stewart 2008). 3This latter property can be thought of as a weak form of calibration, and we r... |

1 |
Doing Something About
- Regnier
- 2008
(Show Context)
Citation Context ...amage and hundreds of deaths annually, there is a critical need for calibrated and sharp probabilistic weather forecasts, to allow for optimal decision making under inherent uncertainty (Dutton 2002; =-=Regnier 2008-=-). Baars and Mass (2005) consider probability of precipitation forecasts for 29 meteorological stations at major urban centers spread across the continental US. They compare the performance of individ... |

1 |
Comments on: Assessing Probabilistic Forecasts of Multivariate Quantities, With an Application to Ensemble Predictions of Surface Winds
- Winkler, Jose
- 2008
(Show Context)
Citation Context ...th century, the transition to probability of precipitation forecasts by the US National Weather Service in 1965 was perhaps the most influential and important event in their development (Murphy 1998; =-=Winkler and Jose 2008-=-). In economics, the Survey of Professional Forecasters has included probability variables since 1968 (Croushore 1993). Of course, there are many other important applications of probability forecasts,... |