## Probabilistic forecasts, calibration and sharpness (2007)

### Cached

### Download Links

- [www.stat.washington.edu]
- [stat.washington.edu]
- [www.stat.washington.edu:80]
- [www.stat.washington.edu]
- [www.stat.washington.edu]
- [www.stat.washington.edu]
- [www.stat.washington.edu]
- [hal.archives-ouvertes.fr]
- [www.stat.washington.edu]
- [www.stat.washington.edu:80]
- [stat.washington.edu]
- [www.stat.washington.edu]
- [www.stat.washington.edu]

Venue: | Journal of the Royal Statistical Society Series B |

Citations: | 38 - 15 self |

### BibTeX

@ARTICLE{Gneiting07probabilisticforecasts,,

author = {Tilmann Gneiting and Fadoua Balabdaoui and Adrian E. Raftery},

title = {Probabilistic forecasts, calibration and sharpness},

journal = {Journal of the Royal Statistical Society Series B},

year = {2007},

pages = {243--268}

}

### OpenURL

### Abstract

Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.

### Citations

651 |
Comparing predictive accuracy
- Diebold, Mariano
- 1995
(Show Context)
Citation Context ...er to this process (Jolliffe and Stephenson 2003), and much of the underlying methodology has been developed by meteorologists. There is also a relevant strand of work in the econometrics literature (=-=Diebold and Mariano 1995-=-; Christoffersen 1998; Diebold, Gunther and Tay 1998). Murphy and Winkler (1987) proposed a general framework for the evaluation of point forecasts that uses a diagnostic approach based on graphical d... |

255 |
Verification of forecasts expressed in terms of probability
- Brier
- 1950
(Show Context)
Citation Context ...is proper, and we rank competing forecast procedures based on its average, CRPS = 1 T T� � ∞ crps(Ft, xt)= BS(y) dy, (14) t=1 −∞ where BS(y) = 1 �Tt=1 T (Ft(y) − 1{xt ≤ y}) 2 denotes the Brier score (=-=Brier 1950-=-) for probability forecasts of the binary events at the threshold value y ∈ R. The Brier score allows for the distinction of a calibration component and a refinement component (Murphy 1972; Blattenber... |

236 |
Evaluating density forecasts with applications to financial risk management
- Diebold, Gunther, et al.
- 1998
(Show Context)
Citation Context ... each Ft corresponds to a one-step ahead forecast, and checks for the uniformity of the probability integral transform have been supplemented by checks for its independence (Frühwirth-Schnatter 1996; =-=Diebold et al. 1998-=-). Hamill (2001) gave a thought-provoking example of a forecaster for whom the histogram of the PIT values is essentially uniform, even though every single probabilistic forecast is biased. His exampl... |

186 | Bayesian model averaging for linear regression models - Raftery, Madigan, et al. - 1997 |

176 | Evaluating interval forecasts
- Christoffersen
- 1998
(Show Context)
Citation Context ...fe and Stephenson 2003), and much of the underlying methodology has been developed by meteorologists. There is also a relevant strand of work in the econometrics literature (Diebold and Mariano 1995; =-=Christoffersen 1998-=-; Diebold, Gunther and Tay 1998). Murphy and Winkler (1987) proposed a general framework for the evaluation of point forecasts that uses a diagnostic approach based on graphical displays, summary meas... |

175 | Remarks on a multivariate transformation - Rosenblatt - 1952 |

171 |
Statistical theory: the prequential approach
- Dawid
- 1984
(Show Context)
Citation Context ...recasts characterize and reduce but generally do not eliminate uncertainty. Consequently, forecasts should be probabilistic in nature, taking the form of probability distributions over future events (=-=Dawid 1984-=-). Indeed, over the past two decades the quest for good probabilistic forecasts has become a driving force in meteorology. Major economic forecasts such as the quarterly Bank of England inflation repo... |

171 | Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica - Gelman, Meng, et al. - 1996 |

156 |
Rational decisions
- GOOD
- 1952
(Show Context)
Citation Context ...eresting discussion of the ways in which proper scoring rules encourage sharp forecasts. The logarithmic score is the negative of the logarithm of the predictive density evaluated at the observation (=-=Good 1952-=-; Bernardo 1979). The logarithmic score is proper and has many desirable properties (Roulston and Smith 2002) yet lacks robustness (Selten 1998; Gneiting and Raftery 2004). The continuous ranked proba... |

151 | Strictly proper scoring rules, prediction, and estimation
- Gneiting, Raftery
- 2007
(Show Context)
Citation Context ...dictive density evaluated at the observation (Good 1952; Bernardo 1979). The logarithmic score is proper and has many desirable properties (Roulston and Smith 2002) yet lacks robustness (Selten 1998; =-=Gneiting and Raftery 2004-=-). The continuous ranked probability score is defined directly in terms of the predictive cumulative distribution function, F , namely as � ∞ crps(F, x) = (F (y) − 1(y ≥ x)) 2 dy, (12) −∞ and provides... |

136 | Probability and Finance: It’s Only a Game
- Shafer, Vovk
- 2001
(Show Context)
Citation Context ...tudied calibration in the context of probability forecasts for sequences of binary events (DeGroot and Fienberg, 1982; Dawid, 1982, 1985a, b; Oakes, 1985; Schervish, 1985, 1989; Dawid and Vovk, 1999; =-=Shafer and Vovk, 2001-=-; Sandroni et al., 2003). The progress is impressive and culminates in the elegant game theoretic approach of Vovk and Shafer (2005). This views forecasting as a game, with three players: forecaster, ... |

135 |
Bayesian computation and stochastic systems (with discussion), Statistical Science 10
- Besag, Green, et al.
- 1995
(Show Context)
Citation Context ...ent is dedicated to probabilistic forecasts of portfolio values (Duffie and Pan 1997; Granger 2006). In the statistical literature, advances in Markov chain Monte Carlo methodology (see, for example, =-=Besag et al. 1995-=-) have led to explosive growth in the use of predictive distributions, mostly in the form of Monte Carlo samples from the posterior predictive distribution of quantities of interest. 1hal-00363242, v... |

126 | Testing Density Forecasts, with Applications to Risk Management - Berkowitz - 2001 |

113 | Bayesianly justifiable and relevant frequency calculations for the applied statistician - Rubin, B - 1984 |

107 |
An overview of value at risk
- Duffie, Pan
- 1997
(Show Context)
Citation Context ...k of England inflation report are issued in terms of predictive distributions, and the rapidly growing area of financial risk management is dedicated 1sto probabilistic forecasts of portfolio values (=-=Duffie and Pan 1997-=-). In the statistical literature, advances in Markov chain Monte Carlo methodology (see, for example, Besag, Green, Higdon and Mengersen 1995) have led to explosive growth in the use of predictive dis... |

82 |
Partial non-Gaussian state space
- Shephard
- 1994
(Show Context)
Citation Context ...prediction systems. In the time series context, the predictive framework is natural and the model fit can be assessed through the performance of the time-forward predictive distributions (Smith 1985; =-=Shephard 1994-=-; Frühwirth-Schnatter 1996; Bouwens et al. 2004). In other types of situations, a cross-validatory approach can often be used fruitfully (Dawid 1984a, p. 288; Gneiting and Raftery 2004). Appendix Proo... |

78 |
The Well-Calibrated Bayesian
- Dawid
- 1982
(Show Context)
Citation Context ...e interpreted in terms of the equality of actual climatology and forecast climatology. Various authors have studied calibration in the context of probability forecasts for sequences of binary events (=-=Dawid 1982-=-, 1985a, 1985b; Oakes 1985; Schervish 1985, 1989). The progress 6sis impressive and culminates in the paper by Foster and Vohra (1998), who viewed the prediction problem as a game played against natur... |

72 | Using bayesian model averaging to calibrate forecast ensembles - Raftery, Gneiting, et al. - 2005 |

71 |
Expected information as expected utility
- Bernardo
- 1979
(Show Context)
Citation Context ...scussion of the ways in which proper scoring rules encourage sharp forecasts. The logarithmic score is the negative of the logarithm of the predictive density evaluated at the observation (Good 1952; =-=Bernardo 1979-=-). The logarithmic score is proper and has many desirable properties (Roulston and Smith 2002) yet lacks robustness (Selten 1998; Gneiting and Raftery 2004). The continuous ranked probability score is... |

70 | Asymptotic calibration - Foster, Vohra - 1998 |

65 | A general framework for forecast verification - Murphy, Winkler - 1987 |

55 | A method for producing and evaluating probabilistic forecasts from ensemble model integrations - Anderson - 1996 |

53 | 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science - Jolliffe, Stephenson |

52 | 2001: Interpretation of rank histograms for verifying ensemble forecasts - Hamill |

52 | Colucci,1997: Verification of Eta-RSM short-range ensemble forecasts - Hamill, J |

48 |
Axiomatic characterization of the quadratic scoring rule
- Selten
- 1998
(Show Context)
Citation Context ...hm of the predictive density evaluated at the observation (Good 1952; Bernardo 1979). The logarithmic score is proper and has many desirable properties (Roulston and Smith 2002) yet lacks robustness (=-=Selten 1998-=-; Gneiting and Raftery 2004). The continuous ranked probability score is defined directly in terms of the predictive cumulative distribution function, F , namely as � ∞ crps(F, x) = (F (y) − 1(y ≥ x))... |

47 | Strauss,1997: Evaluation of probabilistic prediction systems - Talagrand, Vautard, et al. |

45 | Evaluating the Forecast Densities of Linear and Nonlinear Models: Applications to Output Growth and Unemployment - Clements, Smith - 2000 |

44 | Forecasting uncertainties in macroeconomic modelling: An application to the UK economy - Garratt, Lee, et al. - 2003 |

44 |
Probability Forecasting
- Dawid
- 1986
(Show Context)
Citation Context ... all proper scoring rules for binary probability forecasts, the Brier score allows for the distinction of a calibration component and a refinement component (Murphy, 1972; DeGroot and Fienberg, 1983; =-=Dawid, 1986-=-). Candille and Talagrand (2005) discussed calibration–sharpness decompositions of the continuous ranked probability score. Table 5 shows the logarithmic score and the continuous ranked probability sc... |

38 | Chi-squared tests of interval and density forecasts, and the Bank of England’s fan charts - Wallis - 2003 |

33 | Prequential probability: Principles and properties. Bernoulli, 5, 125–162. Retrieved from: http://projecteuclid.org/euclid.bj/1173707098 Andrew Gelman and Cosma Shalizi
- Dawid, Vovk
- 1999
(Show Context)
Citation Context ...ous researchers have studied calibration in the context of probability forecasts for sequences of binary events (DeGroot and Fienberg, 1982; Dawid, 1982, 1985a, b; Oakes, 1985; Schervish, 1985, 1989; =-=Dawid and Vovk, 1999-=-; Shafer and Vovk, 2001; Sandroni et al., 2003). The progress is impressive and culminates in the elegant game theoretic approach of Vovk and Shafer (2005). This views forecasting as a game, with thre... |

32 | Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation - Gneiting, Raftery, et al. - 2005 |

32 | A general method for comparing probability assessors - Schervish - 1989 |

31 | Evaluating probabilistic forecasts using information theory
- Roulston, Smith
(Show Context)
Citation Context ...ogarithmic score is the negative of the logarithm of the predictive density evaluated at the observation (Good 1952; Bernardo 1979). The logarithmic score is proper and has many desirable properties (=-=Roulston and Smith 2002-=-) yet lacks robustness (Selten 1998; Gneiting and Raftery 2004). The continuous ranked probability score is defined directly in terms of the predictive cumulative distribution function, F , namely as ... |

30 |
The comparison and evaluation of forecasters
- DeGroot, Fienberg
- 1983
(Show Context)
Citation Context ...e threshold value y ∈R. Like all proper scoring rules for binary probability forecasts, the Brier score allows for the distinction of a calibration component and a refinement component (Murphy, 1972; =-=DeGroot and Fienberg, 1983-=-; Dawid, 1986). Candille and Talagrand (2005) discussed calibration–sharpness decompositions of the continuous ranked probability score. Table 5 shows the logarithmic score and the continuous ranked p... |

29 |
Calibration with many checking rules
- Sandroni, Smorodinsky, et al.
- 2003
(Show Context)
Citation Context ...he context of probability forecasts for sequences of binary events (DeGroot and Fienberg, 1982; Dawid, 1982, 1985a, b; Oakes, 1985; Schervish, 1985, 1989; Dawid and Vovk, 1999; Shafer and Vovk, 2001; =-=Sandroni et al., 2003-=-). The progress is impressive and culminates in the elegant game theoretic approach of Vovk and Shafer (2005). This views forecasting as a game, with three players: forecaster, sceptic and reality or ... |

28 | An evaluation of tests of distributional forecasts - Noceti, Smith, et al. - 2003 |

27 | A comparison of financial duration models via density forecasts - Giot, Grammig, et al. - 2004 |

27 | Calibration-based Empirical Probability” (with Discussion). Ann Statistics 13, 1251-1285. Reprinted in Probability Concepts, Dialogue and Beliefs, edited by - Dawid - 1985 |

26 | Diagnostic verification of probability forecasts - Murphy, Winkler - 1992 |

26 | Predictive density evaluation
- Corradi, Swanson
- 2006
(Show Context)
Citation Context ...rlying methodology has been developed by meteorologists. There is also a relevant strand of work in the econometrics literature (Diebold and Mariano, 1995; Christoffersen, 1998; Diebold et al., 1998; =-=Corradi and Swanson, 2006-=-). Murphy and Winkler (1987) proposed a general framework for the evaluation of point forecasts that uses a diagnostic approach based on graphical displays, summary measures and scoring rules. In this... |

23 |
Self-calibrating priors do not exist
- OAKES
- 1985
(Show Context)
Citation Context ...the equality of actual climatology and forecast climatology. Various authors have studied calibration in the context of probability forecasts for sequences of binary events (Dawid 1982, 1985a, 1985b; =-=Oakes 1985-=-; Schervish 1985, 1989). The progress 6sis impressive and culminates in the paper by Foster and Vohra (1998), who viewed the prediction problem as a game played against nature as well. Krzysztofowicz ... |

23 | A.: Combining dynamical and statistical ensembles - Roulston, Smith - 2003 |

23 | Good randomized sequential probability forecasting is always possible - Vovk, Shafer - 2005 |

22 | Weather Forecasting for Weather Derivatives - Campbell, Diebold - 2005 |

22 |
2002: The economic value of ensemble forecasts as a tool for risk assessment: From days to decades
- Palmer
(Show Context)
Citation Context ...osphere, run each of them forward in time using a numerical weather prediction model, and use the resulting set of forecasts as a sample from the predictive distribution of future weather quantities (=-=Palmer 2002-=-). The 12sTable 3: Empirical coverages of central prediction intervals. The nominal coverages are 50% and 90%, respectively. Interval 50% 90% Perfect forecaster 51.2% 90.0% Climatological forecaster 5... |

22 |
Weather forecasting with ensemble methods
- Gneiting, Raftery
(Show Context)
Citation Context ... taking the form of probability distributions over future events (Dawid, 1984). Indeed, over the past two decades the quest for good probabilistic forecasts has become a driving force in meteorology (=-=Gneiting and Raftery, 2005-=-). Major economic forecasts such as the quarterly Bank of England inflation report are issued in terms of predictive distributions (Granger, 2006), and the rapidly growing area of financial risk manag... |

20 |
Time series models to simulate and forecast wind speed and wind power. Journal of climate and applied meteorology
- Brown, Katz, et al.
- 1984
(Show Context)
Citation Context ...concerns. The prevalent approach to short-range forecasts of wind speed and wind power at prediction horizons up to a few hours is based on on-site observations and autoregressive time series models (=-=Brown et al., 1984-=-). Gneiting et al. (2004) proposed a novel spatiotemporal approach, the regime-switching space–time (RST) method, that merges meteorological and statistical expertise to obtain fully probabilistic for... |

19 | The impossibility of inductive inference - DAWID - 1985 |