Results 1  10
of
116
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract

Cited by 373 (28 self)
 Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Bayesian Modeling of Uncertainty in Ensembles of Climate Models
, 2008
"... Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be co ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
Projections of future climate change caused by increasing greenhouse gases depend critically on numerical climate models coupling the ocean and atmosphere (GCMs). However, different models differ substantially in their projections, which raises the question of how the different models can best be combined into a probability distribution of future climate change. For this analysis, we have collected both current and future projected mean temperatures produced by nine climate models for 22 regions of the earth. We also have estimates of current mean temperatures from actual observations, together with standard errors, that can be used to calibrate the climate models. We propose a Bayesian analysis that allows us to combine the different climate models into a posterior distribution of future temperature increase, for each of the 22 regions, while allowing for the different climate models to have different variances. Two versions of the analysis are proposed, a univariate analysis in which each region is analyzed separately, and a multivariate analysis in which the 22 regions are combined into an overall statistical model. A crossvalidation approach is proposed to confirm the reasonableness of our Bayesian predictive distributions. The results of this analysis allow for a quantification of the uncertainty of climate model projections as a Bayesian posterior distribution, substantially extending previous approaches to uncertainty in climate models.
Calibrated probabilistic forecasting at the Stateline wind energy center: The regimeswitching spacetime (RST) method
 Journal of the American Statistical Association
, 2004
"... With the global proliferation of wind power, accurate shortterm forecasts of wind resources at wind energy sites are becoming paramount. Regimeswitching spacetime (RST) models merge meteorological and statistical expertise to obtain accurate and calibrated, fully probabilistic forecasts of wind s ..."
Abstract

Cited by 35 (14 self)
 Add to MetaCart
With the global proliferation of wind power, accurate shortterm forecasts of wind resources at wind energy sites are becoming paramount. Regimeswitching spacetime (RST) models merge meteorological and statistical expertise to obtain accurate and calibrated, fully probabilistic forecasts of wind speed and wind power. The model formulation is parsimonious, yet takes account of all the salient features of wind speed: alternating atmospheric regimes, temporal and spatial correlation, diurnal and seasonal nonstationarity, conditional heteroscedasticity, and nonGaussianity. The RST method identifies forecast regimes at the wind energy site and fits a conditional predictive model for each regime. Geographically dispersed meteorological observations in the vicinity of the wind farm are used as offsite predictors. The RST technique was applied to 2hour ahead forecasts of hourly average wind speed at the Stateline wind farm in the US Pacific Northwest. In July 2003, for instance, the RST forecasts had rootmeansquare error (RMSE) 28.6 % less than the persistence forecasts. For each month in the test period, the RST forecasts had lower RMSE than forecasts using stateoftheart vector time series techniques. The RST method provides probabilistic forecasts in the form of
Predictive model assessment for count data
, 2007
"... Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predicti ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian ageperiodcohort models for larynx cancer counts in Germany.
Default Priors and Predictive Performance in Bayesian Model Averaging, with Application to Growth Determinants
 Journal of Applied Econometrics
, 2011
"... Abstract Bayesian model averaging (BMA) has become widely accepted as a way of accounting for model uncertainty, notably in regression models for identifying the determinants of economic growth. To implement BMA the user must specify a prior distribution in two parts: a prior for the regression par ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
Abstract Bayesian model averaging (BMA) has become widely accepted as a way of accounting for model uncertainty, notably in regression models for identifying the determinants of economic growth. To implement BMA the user must specify a prior distribution in two parts: a prior for the regression parameters and a prior over the model space. Here we address the issue of which default prior to use for BMA in linear regression. We compare 12 candidate parameter priors: the Unit Information Prior (UIP) corresponding to the BIC or Schwarz approximation to the integrated likelihood, a proper datadependent prior, and 10 priors considered by Fernandez et al. (2001b). We also compare the uniform model prior to others that favor smaller models. We compare them on the basis of crossvalidated predictive performance on a wellknown growth dataset and on two simulated examples from the literature. We found that the UIP with uniform model prior generally outperformed the other priors considered. It also identified the largest set of growth determinants. JEL Classification: O51, O52, O53.
Evaluating Density Forecasts: Forecast Combinations, Model Mixtures, Calibration and Sharpness
, 2008
"... In a recent article Gneiting, Balabdaoui and Raftery (JRSSB, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cann ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
In a recent article Gneiting, Balabdaoui and Raftery (JRSSB, 2007) propose the criterion of sharpness for the evaluation of predictive distributions or density forecasts. They motivate their proposal by an example in which standard evaluation procedures based on probability integral transforms cannot distinguish between the ideal forecast and several competing forecasts. In this paper we show that their example has some unrealistic features from the perspective of the timeseries forecasting literature, hence it is an insecure foundation for their argument that existing calibration procedures are inadequate in practice. We present an alternative, more realistic example in which relevant statistical methods, including informationbased methods, provide the required discrimination between competing forecasts. We conclude that there is no need for a subsidiary criterion of sharpness.
Bayesian Probabilistic Projections of Life Expectancy for All Countries 1
, 2010
"... was supported by NICHD grant R01 HD54511. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institute of Child Health and Human Development. Also, the views expressed in this paper are those of the authors and do not necessa ..."
Abstract

Cited by 18 (9 self)
 Add to MetaCart
(Show Context)
was supported by NICHD grant R01 HD54511. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Institute of Child Health and Human Development. Also, the views expressed in this paper are those of the authors and do not necessarily reflect the views of the United Nations. Its contents have not been formally edited and cleared by the United Nations. The designations employed and the presentation of material in this paper do not imply the expression of any opinion whatsoever on the part of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The authors are grateful to Leontine
Calibrating MultiModel Forecast Ensembles with Exchangeable and Missing Members using Bayesian Model Averaging ∗
, 2009
"... Sloughter for sharing their insights and providing data. This research was sponsored by the National Science Foundation under Joint Ensemble Forecasting System (JEFS) subaward No. S0647225 with the University Corporation for Atmospheric Research (UCAR), as well as grants No. ATM0724721 and No. DMS ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Sloughter for sharing their insights and providing data. This research was sponsored by the National Science Foundation under Joint Ensemble Forecasting System (JEFS) subaward No. S0647225 with the University Corporation for Atmospheric Research (UCAR), as well as grants No. ATM0724721 and No. DMS0706745. Bayesian model averaging (BMA) is a statistical postprocessing technique that generates calibrated and sharp predictive probability density functions (PDFs) from forecast ensembles. It represents the predictive PDF as a weighted average of PDFs centered on the biascorrected ensemble members, where the weights reflect the relative skill of the individual members over a training period. This work adapts the BMA approach to situations that arise frequently in practice, namely, when one or more of the member forecasts are exchangeable, and when there are missing ensemble members. Exchangeable members differ in random perturbations only, such as the members of bred ensembles, singular vector ensembles, or ensemble Kalman filter systems. Accounting for exchangeability simplifies the BMA approach, in that the BMA weights and the parameters of the component PDFs can be assumed to
Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression,”
, 2009
"... ..."
(Show Context)
A Nonmanipulable Test
 ANNALS OF STATISTICS
, 2009
"... A test is said to control for type I error if it is unlikely to reject the datagenerating process. However, if it is possible to produce stochastic processes at random such that, for all possible future realizations of the data, the selected process is unlikely to be rejected, then the test is said ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
A test is said to control for type I error if it is unlikely to reject the datagenerating process. However, if it is possible to produce stochastic processes at random such that, for all possible future realizations of the data, the selected process is unlikely to be rejected, then the test is said to be manipulable. So, a manipulable test has essentially no capacity to reject a strategic expert. Many tests proposed in the existing literature, including calibration tests, control for type I error but are manipulable. We construct a test that controls for type I error and is nonmanipulable.