Results 1  10
of
46
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract

Cited by 158 (17 self)
 Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Bayesian Modeling of Uncertainty in Ensembles of Climate Models
, 2008
"... Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be co ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be combined into a probability distribution of future climate change. For this analysis, we have collected both current and future projected mean temperatures produced by nine climate models for 22 regions of the earth. We also have estimates of current mean temperatures from actual observations, together with standard errors, that can be used to calibrate the climate models. We propose a Bayesian analysis that allows us to combine the different climate models into a posterior distribution of future temperature increase, for each of the 22 regions, while allowing for the different climate models to have different variances. Two versions of the analysis are proposed, a univariate analysis in which each region is analyzed separately, and a multivariate analysis in which the 22 regions are combined into an overall statistical model. A crossvalidation approach is proposed to confirm the reasonableness of our Bayesian predictive distributions. The results of this analysis allow for a quantification of the uncertainty of climate model projections as a Bayesian posterior distribution, substantially extending previous approaches to uncertainty in climate models.
Predictive model assessment for count data
, 2007
"... Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predicti ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian ageperiodcohort models for larynx cancer counts in Germany.
Evaluating Density Forecasts: Forecast Combinations, Model Mixtures, Calibration and Sharpness
, 2008
"... In a recent article Gneiting, Balabdaoui and Raftery (JRSSB, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cann ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
In a recent article Gneiting, Balabdaoui and Raftery (JRSSB, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cannot distinguish between the ideal forecast and several competing forecasts. In this paper we show that their example has some unrealistic features from the perspective of the timeseries forecasting literature, hence it is an insecure foundation for their argument that existing calibration procedures are inadequate in practice. We present an alternative, more realistic example in which relevant statistical methods, including informationbased methods, provide the required discrimination between competing forecasts. We conclude that there is no need for a subsidiary criterion of sharpness.
A Nonmanipulable Test
 ANNALS OF STATISTICS
, 2009
"... A test is said to control for type I error if it is unlikely to reject the datagenerating process. However, if it is possible to produce stochastic processes at random such that, for all possible future realizations of the data, the selected process is unlikely to be rejected, then the test is said ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
A test is said to control for type I error if it is unlikely to reject the datagenerating process. However, if it is possible to produce stochastic processes at random such that, for all possible future realizations of the data, the selected process is unlikely to be rejected, then the test is said to be manipulable. So, a manipulable test has essentially no capacity to reject a strategic expert. Many tests proposed in the existing literature, including calibration tests, control for type I error but are manipulable. We construct a test that controls for type I error and is nonmanipulable.
Calibrating MultiModel Forecast Ensembles with Exchangeable and Missing Members using Bayesian Model Averaging ∗
, 2009
"... Sloughter for sharing their insights and providing data. This research was sponsored by the National Science Foundation under Joint Ensemble Forecasting System (JEFS) subaward No. S0647225 with the University Corporation for Atmospheric Research (UCAR), as well as grants No. ATM0724721 and No. DMS ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Sloughter for sharing their insights and providing data. This research was sponsored by the National Science Foundation under Joint Ensemble Forecasting System (JEFS) subaward No. S0647225 with the University Corporation for Atmospheric Research (UCAR), as well as grants No. ATM0724721 and No. DMS0706745. Bayesian model averaging (BMA) is a statistical postprocessing technique that generates calibrated and sharp predictive probability density functions (PDFs) from forecast ensembles. It represents the predictive PDF as a weighted average of PDFs centered on the biascorrected ensemble members, where the weights reflect the relative skill of the individual members over a training period. This work adapts the BMA approach to situations that arise frequently in practice, namely, when one or more of the member forecasts are exchangeable, and when there are missing ensemble members. Exchangeable members differ in random perturbations only, such as the members of bred ensembles, singular vector ensembles, or ensemble Kalman filter systems. Accounting for exchangeability simplifies the BMA approach, in that the BMA weights and the parameters of the component PDFs can be assumed to
PROBABILISTIC QUANTITATIVE PRECIPITATION FIELD FORECASTING USING A TWOSTAGE SPATIAL MODEL 1
, 2008
"... Shortrange forecasts of precipitation fields are needed in a wealth of agricultural, hydrological, ecological and other applications. Forecasts from numerical weather prediction models are often biased and do not provide uncertainty information. Here we present a postprocessing technique for such n ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Shortrange forecasts of precipitation fields are needed in a wealth of agricultural, hydrological, ecological and other applications. Forecasts from numerical weather prediction models are often biased and do not provide uncertainty information. Here we present a postprocessing technique for such numerical forecasts that produces correlated probabilistic forecasts of precipitation accumulation at multiple sites simultaneously. The statistical model is a spatial version of a twostage model that represents the distribution of precipitation by a mixture of a point mass at zero and a Gamma density for the continuous distribution of precipitation accumulation. Spatial correlation is captured by assuming that two Gaussian processes drive precipitation occurrence and precipitation amount, respectively. The first process is latent and drives precipitation occurrence via a threshold. The second process explains the spatial correlation in precipitation accumulation. It is related to precipitation via a sitespecific transformation function, so as to retain the marginal rightskewed distribution of precipitation while modeling spatial dependence. Both processes take into account the information contained in the numerical weather forecast and are modeled as stationary isotropic spatial processes with an exponential correlation function. The twostage spatial model was applied to 48hourahead forecasts of daily precipitation accumulation over the Pacific Northwest
ensembleBMA: An R Package for Probabilistic Forecasting using Ensembles and Bayesian Model Averaging ∗
, 2007
"... ensembleBMA is a contributed R package for probabilistic forecasting using ensemble postprocessing via Bayesian Model Averaging. It provides functions for modeling and forecasting with data that may include missing ensemble member forecasts. The modeling can also account for exchangeable ensemble me ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
ensembleBMA is a contributed R package for probabilistic forecasting using ensemble postprocessing via Bayesian Model Averaging. It provides functions for modeling and forecasting with data that may include missing ensemble member forecasts. The modeling can also account for exchangeable ensemble members. The modeling functions estimate model parameters via the EM algorithm for normal mixture models (appropriate for temperature or pressure) and mixtures of gamma distributions with a point mass at 0 (appropriate for precipitation) from training data. Also included are functions for forecasting from these models, as well as functions for verification to assess forecasting performance. Thanks go to Veronica Berrocal and Patrick Tewson for lending their expertise on a number of important issues, to Michael Polakowski for his work on an earlier version of the package, and to Bobby Yuen for complementary work on ensembleMOS. We are also indebted to Cliff Mass, Jeff Baars, and Eric Grimit for many helpful discussions and for sharing data. Supported by the DoD Multidisciplinary Research Initiative
Combining Probability Forecasts
, 2008
"... Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even i ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even in the ideal case in which the individual forecasts are calibrated. Toward this end, we propose a beta transformed linear opinion pool (BLP) for the aggregation of probability forecasts from distinct, calibrated or uncalibrated sources. The BLP method fits an optimal nonlinearly recalibrated forecast combination, by compositing a beta transform and the traditional linear opinion pool. The technique is illustrated in a simulation example and in a case study on statistical and National Weather Service probability of precipitation forecasts.