Results 1 - 10
of
17
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract
-
Cited by 86 (13 self)
- Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to cross-validation, and propose a novel form of cross-validation known as random-fold cross-validation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications,” manuscript, available at www-stat.wharton.upenn.edu/~buja
, 2005
"... What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: so-called “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisher-consistent manner. Proper scoring rules comprise most loss functions currently in use: log-loss, squared error loss, boosting loss, and as limiting cases cost-weighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of cost-weighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 1-1 correspondence with information measures for tree-based classification.
Probabilistic forecasts, calibration and sharpness
- Journal of the Royal Statistical Society Series B
, 2007
"... Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive dis ..."
Abstract
-
Cited by 24 (11 self)
- Add to MetaCart
Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
Eliciting Properties of Probability Distributions
- In Proceedings of the ninth ACM conference on electronic commerce
, 2008
"... We investigate the problem of incentivizing an expert to truthfully reveal probabilistic information about a random event. Probabilistic information consists of one or more properties, which are any real-valued functions of the distribution, such as the mean and variance. Not all properties can be e ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We investigate the problem of incentivizing an expert to truthfully reveal probabilistic information about a random event. Probabilistic information consists of one or more properties, which are any real-valued functions of the distribution, such as the mean and variance. Not all properties can be elicited truthfully. We provide a simple characterization of elicitable properties, and describe the general form of the associated payment functions that induce truthful revelation. We then consider sets of properties, and observe that all properties can be inferred from sets of elicitable properties. This suggests the concept of elicitation complexity for a property, the size of the smallest set implying the property.
Information, Divergence and Risk for Binary Experiments
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to cost-sensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Surrogate Regret Bounds for Proper Losses
"... We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the margin-based bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral repr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the margin-based bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral representation of proper losses. We also present a different formulation of a duality result of Bregman divergences which leads to a simple demonstration of the convexity of composite losses using canonical link functions. 1.
Composite Binary Losses
, 2009
"... We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitl ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and “classification calibrated ” losses. We also consider the question of the “best ” surrogate binary loss. We introduce a precise notion of “best ” and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of “surrogate tuning”. Finally, in an appendix we present some new algorithm-independent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are non-robust to misclassification noise. 1
Forecasting with Imprecise Probabilities
"... We review de Finetti’s two coherence criteria for determinate probabilities: coherence1 defined in terms of previsions for a set of random variables that are undominated by the status quo – previsions immune to a sure-loss – and coherence2 defined in terms of forecasts for events undominated in Brie ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We review de Finetti’s two coherence criteria for determinate probabilities: coherence1 defined in terms of previsions for a set of random variables that are undominated by the status quo – previsions immune to a sure-loss – and coherence2 defined in terms of forecasts for events undominated in Brier score by a rival forecast. We propose a criterion of IP-coherence2 based on a generalization of Brier score for IP-forecasts that uses 1-sided, lower and upper, probability forecasts. However, whereas Brier score is a strictly proper scoring rule for eliciting determinate probabilities, we show that there is no real-valued strictly proper IP-score. Nonetheless, with respect to either of two decision rules – Γ-Maximin or (Levi’s) E-admissibility-+-Γ-Maximin – we give a lexicographic strictly proper IP-scoring rule that is based on Brier score. Keywords. Brier score, coherence, dominance, E-admissibility, Γ-Maximin, proper scoring rules.
Combining Probability Forecasts
, 2008
"... Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even i ..."
Abstract
- Add to MetaCart
Linear pooling is by the far the most popular method for combining probability forecasts. However, any nontrivial weighted average of two or more distinct, calibrated probability forecasts is necessarily uncalibrated and lacks sharpness. In view of this, linear pooling requires recalibration, even in the ideal case in which the individual forecasts are calibrated. Toward this end, we propose a beta transformed linear opinion pool (BLP) for the aggregation of probability forecasts from distinct, calibrated or uncalibrated sources. The BLP method fits an optimal nonlinearly recalibrated forecast combination, by compositing a beta transform and the traditional linear opinion pool. The technique is illustrated in a simulation example and in a case study on statistical and National Weather Service probability of precipitation forecasts.
, PREDICTIVE EVALUATION OF LOGISTIC MODELS
, 1992
"... Logistic models are assessed for their ability to produce valid forecasts for new observables rather than for their goodness of fit to past observations. The diagnostics described here are based on scoring rules and, being updated easily, are particularly well-suited to a sequential context. Test st ..."
Abstract
- Add to MetaCart
Logistic models are assessed for their ability to produce valid forecasts for new observables rather than for their goodness of fit to past observations. The diagnostics described here are based on scoring rules and, being updated easily, are particularly well-suited to a sequential context. Test statistics measuring different aspects of empirical validity are described. Simulations for moderate sample sizes compliment results pertaining to their asymptotic behaviour.

