Results 1 - 10
of
14
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract
-
Cited by 86 (13 self)
- Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to cross-validation, and propose a novel form of cross-validation known as random-fold cross-validation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Interpretation Of Rank Histograms For Verifying Ensemble Forecasts
, 2000
"... Rank histograms are a tool for evaluating ensemble forecasts. They are useful for determining the reliability of ensemble forecasts and for diagnosing errors in its mean and spread. Rank histograms are generated by repeatedly tallying the rank of the verification (usually, an observation) relative t ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Rank histograms are a tool for evaluating ensemble forecasts. They are useful for determining the reliability of ensemble forecasts and for diagnosing errors in its mean and spread. Rank histograms are generated by repeatedly tallying the rank of the verification (usually, an observation) relative to values from an ensemble sorted from lowest to highest. However, an uncritical use of the rank histogram can lead to misinterpretations of the qualities of that ensemble. For example, a flat rank histogram, ususally taken as a sign of reliability, can still be generated from unreliable ensembles. Similarly, a U-shaped rank histogram, commonly understood as indicating a lack of variability in the ensemble, can also be a sign of conditional bias. It is also shown that flat rank histograms can be generated for some model variables if the variance of the ensemble is correctly specified, yet if covariances between model grid points are improperly specified, rank histograms for combinations of mo...
Simulation of interannual variability of tropical storm frequency in an ensemble of GCM integrations
- J. Climate
, 1997
"... The present study examines the simulation of the number of tropical storms produced in GCM integrations with a prescribed SST. A 9-member ensemble of 10-yr integrations (1979–88) of a T42 atmospheric model forced by observed SSTs has been produced; each ensemble member differs only in the initial at ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The present study examines the simulation of the number of tropical storms produced in GCM integrations with a prescribed SST. A 9-member ensemble of 10-yr integrations (1979–88) of a T42 atmospheric model forced by observed SSTs has been produced; each ensemble member differs only in the initial atmospheric conditions. An objective procedure for tracking-model-generated tropical storms is applied to this ensemble during the last 9 yr of the integrations (1980–88). The seasonal and monthly variations of tropical storm numbers are compared with observations for each ocean basin. Statistical tools such as the Chi-square test, the F test, and the t test are applied to the ensemble number of tropical storms, leading to the conclusion that the potential predictability is particularly strong over the western North Pacific and the eastern North Pacific, and to a lesser extent over the western North Atlantic. A set of tools including the joint probability distribution and the ranked probability score are used to evaluate the simulation skill of this ensemble simulation. The simulation skill over the western North Atlantic basin appears to be exceptionally high, particularly during years of strong potential predictability. 1.
Predictive model assessment for count data
, 2007
"... Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a non-randomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predicti ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Summary. We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for ordered discrete data. Our proposals include a non-randomized version of the probability integral transform, marginal calibration diagrams and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian age-period-cohort models for larynx cancer counts in Germany.
2004: A consolidated CLIPER model for improved AugustSeptember ENSO prediction skill, Wea
- Forecasting
"... A prime challenge for ENSO seasonal forecast models is to predict boreal summer ENSO conditions at lead. August-September ENSO has a strong influence on Atlantic hurricane activity, Northwest Pacific typhoon activity and tropical precipitation. However, summer ENSO skill is low due to the spring pre ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A prime challenge for ENSO seasonal forecast models is to predict boreal summer ENSO conditions at lead. August-September ENSO has a strong influence on Atlantic hurricane activity, Northwest Pacific typhoon activity and tropical precipitation. However, summer ENSO skill is low due to the spring predictability barrier between March and May. A 'Consolidated' ENSO-CLIPER seasonal prediction model is presented to address this issue with promising initial results. Consolidated CLIPER comprises the ensemble of 18 model variants of the statistical ENSO-CLIPER (CLImatology and PERsistence) prediction model. Assessing August-September ENSO skill using deterministic and probabilistic skill measures applied to crossvalidated hindcasts 1952-2002 and using deterministic skill measures applied to replicated realtime forecasts 1900-1950, shows that the consolidated CLIPER model consistently outperforms the standard CLIPER model at leads from 2 to 6 months for all the main ENSO indices (3, 3.4 and 4). The consolidated CLIPER August-September 1952-2002 hindcast skill is also positive to 97.5 % confidence at leads out to 4 months (early April) for all ENSO indices. Optimisation of the consolidated CLIPER model may lead to further skill improvements. 2
Diagnostics verification of the Climate Prediction Center long-lead outlooks
- J. Climate
, 2000
"... The performance of the Climate Prediction Center’s long-lead forecasts for the period 1995–98 is assessed through a diagnostic verification, which involves examination of the full joint frequency distributions of the forecasts and the corresponding observations. The most striking results of the veri ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The performance of the Climate Prediction Center’s long-lead forecasts for the period 1995–98 is assessed through a diagnostic verification, which involves examination of the full joint frequency distributions of the forecasts and the corresponding observations. The most striking results of the verifications are the strong cool and dry biases of the outlooks. These seem clearly related to the 1995–98 period being warmer and wetter than the 1961–90 climatological base period. This bias results in the ranked probability score indicating very low skill for both temperature and precipitation forecasts at all leads. However, the temperature forecasts at all leads, and the precipitation forecasts for leads up to a few months, exhibit very substantial resolution: low (high) forecast probabilities are consistently associated with lower (higher) than average relative frequency of event occurrence, even though these relative frequencies are substantially different (because of the unconditional biases) from the forecast probabilities. Conditional biases, related to systematic under- or overconfidence on the part of the forecasters, are also evident in some circumstances. 1.
A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning Carine Hue
"... In this paper, we consider the supervised learning task which consists in predicting the normalized rank of a numerical variable. We introduce a novel probabilistic approach to estimate the posterior distribution of the target rank conditionally to the predictors. We turn this learning task into a m ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we consider the supervised learning task which consists in predicting the normalized rank of a numerical variable. We introduce a novel probabilistic approach to estimate the posterior distribution of the target rank conditionally to the predictors. We turn this learning task into a model selection problem. For that, we define a 2D partitioning family obtained by discretizing numerical variables and grouping categorical ones and we derive an analytical criterion to select the partition with the highest posterior probability. We show how these partitions can be used to build univariate predictors and multivariate ones under a naive Bayes assumption. We also propose a new evaluation criterion for probabilistic rank estimators. Based on the logarithmic score, we show that such criterion presents the advantage to be minored, which is not the case of the logarithmic score computed for probabilistic value estimator. A first set of experimentations on synthetic data shows the good properties of the proposed criterion and of our partitioning approach. A second set of experimentations on real data shows competitive performance of the univariate and selective naive Bayes rank estimators projected on the value range compared to methods submitted to a recent challenge on probabilistic metric regression tasks. Our approach is applicable for all regression problems with categorical or numerical predictors. It is particularly interesting for those with a high number of predictors as it automatically detects the variables which contain predictive information. It builds pertinent predictors of the normalized rank of the numerical target from one or several predictors. As the criteria selection is regularized by the presence of a prior and a posterior term, it does not suffer from overfitting.
Scoring Rules and Survey Density Forecasts
, 2009
"... Abstract This paper provides a practical evaluation of some leading density forecast scoring rules in the context of forecast surveys. We analyse the density forecasts of UK inflation obtained from the Bank of England’s Survey of External Forecasters, considering both the survey average forecasts pu ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract This paper provides a practical evaluation of some leading density forecast scoring rules in the context of forecast surveys. We analyse the density forecasts of UK inflation obtained from the Bank of England’s Survey of External Forecasters, considering both the survey average forecasts published in the Bank’s quarterly Inflation Report, and the individual survey responses recently made available to researchers by the Bank. The density forecasts are collected in histogram format, and the ranked probability score (RPS) is shown to have clear advantages over other scoring rules. Missing observations are a feature of forecast surveys, and we introduce an adjustment to the RPS, based on the Yates decomposition, to improve its comparative measurement of forecaster performance in the face of differential non-response. The new measure, denoted RPS*, is recommended to analysts of forecast surveys.

