Results 1 - 10
of
21
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract
-
Cited by 86 (13 self)
- Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to cross-validation, and propose a novel form of cross-validation known as random-fold cross-validation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Bayesian Modeling of Uncertainty in Ensembles of Climate Models
, 2008
"... Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be co ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be combined into a probability distribution of future climate change. For this analysis, we have collected both current and future projected mean temperatures produced by nine climate models for 22 regions of the earth. We also have estimates of current mean temperatures from actual observations, together with standard errors, that can be used to calibrate the climate models. We propose a Bayesian analysis that allows us to combine the different climate models into a posterior distribution of future temperature increase, for each of the 22 regions, while allowing for the different climate models to have different variances. Two versions of the analysis are proposed, a univariate analysis in which each region is analyzed separately, and a multivariate analysis in which the 22 regions are combined into an overall statistical model. A cross-validation approach is proposed to confirm the reasonableness of our Bayesian predictive distributions. The results of this analysis allow for a quantification of the uncertainty of climate model projections as a Bayesian posterior distribution, substantially extending previous approaches to uncertainty in climate models.
Evaluating Density Forecasts: Forecast Combinations, Model Mixtures, Calibration and Sharpness
, 2008
"... In a recent article Gneiting, Balabdaoui and Raftery (JRSSB, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cann ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
In a recent article Gneiting, Balabdaoui and Raftery (JRSSB, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cannot distinguish between the ideal forecast and several competing forecasts. In this paper we show that their example has some unrealistic features from the perspective of the time-series forecasting literature, hence it is an insecure foundation for their argument that existing calibration procedures are inadequate in practice. We present an alternative, more realistic example in which relevant statistical methods, including information-based methods, provide the required discrimination between competing forecasts. We conclude that there is no need for a subsidiary criterion of sharpness.
Predictive model assessment for count data
, 2007
"... Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a non-randomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predicti ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a non-randomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian age-period-cohort models for larynx cancer counts in Germany.
Calibrating Multi-Model Forecast Ensembles with Exchangeable and Missing Members using Bayesian Model Averaging ∗
, 2009
"... Sloughter for sharing their insights and providing data. This research was sponsored by the National Science Foundation under Joint Ensemble Forecasting System (JEFS) subaward No. S06-47225 with the University Corporation for Atmospheric Research (UCAR), as well as grants No. ATM-0724721 and No. DMS ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Sloughter for sharing their insights and providing data. This research was sponsored by the National Science Foundation under Joint Ensemble Forecasting System (JEFS) subaward No. S06-47225 with the University Corporation for Atmospheric Research (UCAR), as well as grants No. ATM-0724721 and No. DMS-0706745. Bayesian model averaging (BMA) is a statistical postprocessing technique that generates calibrated and sharp predictive probability density functions (PDFs) from forecast ensembles. It represents the predictive PDF as a weighted average of PDFs centered on the bias-corrected ensemble members, where the weights reflect the relative skill of the individual members over a training period. This work adapts the BMA approach to situations that arise frequently in practice, namely, when one or more of the member forecasts are exchangeable, and when there are missing ensemble members. Exchangeable members differ in random perturbations only, such as the members of bred ensembles, singular vector ensembles, or ensemble Kalman filter systems. Accounting for exchangeability simplifies the BMA approach, in that the BMA weights and the parameters of the component PDFs can be assumed to
Probabilistic Quantitative Precipitation Forecasting using a Two-Stage Spatial Model
, 2008
"... Multidisciplinary University Research Initiative (MURI) program administered by the Office of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Multidisciplinary University Research Initiative (MURI) program administered by the Office of
Combining Probability Forecasts
, 2008
"... Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even i ..."
Abstract
- Add to MetaCart
Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even in the ideal case in which the individual forecasts are calibrated. Toward this end, we propose a beta transformed linear opinion pool (BLP) for the aggregation of probability forecasts from distinct, calibrated or uncalibrated sources. The BLP method fits an optimal nonlinearly recalibrated forecast combination, by compositing a beta transform and the traditional linear opinion pool. The technique is illustrated in a simulation example and in a case study on statistical and National Weather Service probability of precipitation forecasts.
Probabilistic Forecasts of Wind Speed: Ensemble Model Output Statistics using Heteroskedastic Censored Regression
, 2008
"... As wind energy penetration continues to grow, there is a critical need for probabilistic forecasts of wind resources. In addition, there are many other societally relevant uses for forecasts of wind speed, ranging from aviation to ship routing and recreational boating. Over the past two decades, ens ..."
Abstract
- Add to MetaCart
As wind energy penetration continues to grow, there is a critical need for probabilistic forecasts of wind resources. In addition, there are many other societally relevant uses for forecasts of wind speed, ranging from aviation to ship routing and recreational boating. Over the past two decades, ensembles of numerical weather prediction (NWP) models have been developed, in which multiple estimates of the current state of the atmosphere are used to generate a collection of deterministic forecasts. However, even state-of-the-art ensemble systems are uncalibrated and biased. Here we propose a novel way of statistically post-processing NWP ensembles for wind speed using heteroskedastic censored (Tobit) regression, where location and spread derive from the ensemble forecast. The resulting ensemble model output statistics (EMOS) method is applied to 48-hour ahead forecasts of maximum wind speed over the North American Pacific Northwest in 2003 using the University of Washington Mesoscale Ensemble. The statistically post-processed EMOS density forecasts turn out to be calibrated and sharp, and result in substantial improvement over the unprocessed NWP ensemble or climatological reference forecasts.
Evaluating Density Forecasts: Is Sharpness Needed?
, 2008
"... Summary. In a recent article Gneiting, Balabdaoui and Raftery (Journal of the Royal Statistical Society B, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures ..."
Abstract
- Add to MetaCart
Summary. In a recent article Gneiting, Balabdaoui and Raftery (Journal of the Royal Statistical Society B, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cannot distinguish between the ideal forecast and several competing forecasts. In this paper we show that their example has some unrealistic features which make it an insecure foundation for their argument that existing calibration procedures are inadequate in practice. We present an alternative, more realistic example in which relevant statistical methods provide the required discrimination between competing forecasts, and argue that there is no need for a subsidiary criterion of sharpness.

