Results 1  10
of
38
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract

Cited by 182 (17 self)
 Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Probabilistic forecasts, calibration and sharpness
 Journal of the Royal Statistical Society Series B
, 2007
"... Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive dis ..."
Abstract

Cited by 53 (16 self)
 Add to MetaCart
Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with crossvalidation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications,” manuscript, available at wwwstat.wharton.upenn.edu/~buja
, 2005
"... What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: socalled “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisherconsistent manner. Proper scoring rules comprise most ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
What are the natural loss functions or fitting criteria for binary class probability estimation? This question has a simple answer: socalled “proper scoring rules”, that is, functions that score probability estimates in view of data in a Fisherconsistent manner. Proper scoring rules comprise most loss functions currently in use: logloss, squared error loss, boosting loss, and as limiting cases costweighted misclassification losses. Proper scoring rules have a rich structure: • Every proper scoring rules is a mixture (limit of sums) of costweighted misclassification losses. The mixture is specified by a weight function (or measure) that describes which misclassification cost weights are most emphasized by the proper scoring rule. • Proper scoring rules permit Fisher scoring and Iteratively Reweighted LS algorithms for model fitting. The weights are derived from a link function and the above weight function. • Proper scoring rules are in a 11 correspondence with information measures for treebased classification.
Eliciting Properties of Probability Distributions
 In Proceedings of the ninth ACM conference on electronic commerce
, 2008
"... We investigate the problem of incentivizing an expert to truthfully reveal probabilistic information about a random event. Probabilistic information consists of one or more properties, which are any realvalued functions of the distribution, such as the mean and variance. Not all properties can be e ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
We investigate the problem of incentivizing an expert to truthfully reveal probabilistic information about a random event. Probabilistic information consists of one or more properties, which are any realvalued functions of the distribution, such as the mean and variance. Not all properties can be elicited truthfully. We provide a simple characterization of elicitable properties, and describe the general form of the associated payment functions that induce truthful revelation. We then consider sets of properties, and observe that all properties can be inferred from sets of elicitable properties. This suggests the concept of elicitation complexity for a property, the size of the smallest set implying the property.
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
Composite Binary Losses
, 2009
"... We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitl ..."
Abstract

Cited by 13 (9 self)
 Add to MetaCart
(Show Context)
We study losses for binary classification and class probability estimation and extend the understanding of them from margin losses to general composite losses which are the composition of a proper loss with a link function. We characterise when margin losses can be proper composite losses, explicitly show how to determine a symmetric loss in full from half of one of its partial losses, introduce an intrinsic parametrisation of composite binary losses and give a complete characterisation of the relationship between proper losses and “classification calibrated ” losses. We also consider the question of the “best ” surrogate binary loss. We introduce a precise notion of “best ” and show there exist situations where two convex surrogate losses are incommensurable. We provide a complete explicit characterisation of the convexity of composite binary losses in terms of the link function and the weight function associated with the proper loss which make up the composite loss. This characterisation suggests new ways of “surrogate tuning”. Finally, in an appendix we present some new algorithmindependent results on the relationship between properness, convexity and robustness to misclassification noise for binary losses and show that all convex proper losses are nonrobust to misclassification noise. 1
Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief
"... Traditional epistemology is both dogmatic and alethic. It is dogmatic in the sense that it takes the fundamental doxastic attitude to be full belief, the state in which a person categorically accepts some proposition as true. It is alethic in the sense that it evaluates such categorical beliefs on t ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Traditional epistemology is both dogmatic and alethic. It is dogmatic in the sense that it takes the fundamental doxastic attitude to be full belief, the state in which a person categorically accepts some proposition as true. It is alethic in the sense that it evaluates such categorical beliefs on the basis of what William James calls the ‘two great commandments ’ of epistemology: Believe the truth! Avoid error! Other central concepts of dogmatic epistemology – knowledge, justification, reliability, sensitivity, and so on – are understood in terms of their relationships to this ultimate standard of truth or accuracy. Some epistemologists, inspired by Bayesian approaches in decision theory and statistics, have sought to replace the dogmatic model with a probabilistic one in which partial beliefs, or credences, play the leading role. A person’s credence in a proposition X is her level of confidence in its truth. This corresponds, roughly, to the degree to which she is disposed to presuppose X in her theoretical and practical reasoning. Credences are inherently gradational: the strength of a partial belief in X can range from certainty of truth, through maximal uncertainty (in which X and its negation ∼X are believed equally strongly), to complete certainty of falsehood. These variations in confidence are warranted by differing states of evidence, and they rationalize different choices among options whose outcomes depend on X. It is a central normative doctrine of probabilistic epistemology that rational credences should obey the laws of probability. In the idealized case where a believer has a numerically precise credence b(X) for every proposition X in some Boolean algebra of propositions, 1 these laws are as follows:
Surrogate Regret Bounds for Proper Losses
"... We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the marginbased bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral repr ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We present tight surrogate regret bounds for the class of proper (i.e., Fisher consistent) losses. The bounds generalise the marginbased bounds due to Bartlett et al. (2006). The proof uses Taylor’s theorem and leads to new representations for loss and regret and a simple proof of the integral representation of proper losses. We also present a different formulation of a duality result of Bregman divergences which leads to a simple demonstration of the convexity of composite losses using canonical link functions. 1.
Elicitation and evaluation of statistical forecasts
, 2010
"... This paper studies mechanisms for eliciting and evaluating statistical forecasts. Nature draws a state at random from a given state space, according to some distribution p. Prior to Nature’s move, a forecaster, who knows p, provides a prediction for a given statistic of p. The mechanism defines the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
This paper studies mechanisms for eliciting and evaluating statistical forecasts. Nature draws a state at random from a given state space, according to some distribution p. Prior to Nature’s move, a forecaster, who knows p, provides a prediction for a given statistic of p. The mechanism defines the forecaster’s payoff as a function of the prediction and the subsequently realized state. When the statistic is continuous with a continuum of values, the payoffs that provide strict incentives to the forecaster exist if and only if the statistic partitions the set of distributions into convex subsets. When the underlying state space is finite, and the statistic takes values in a finite set, these payoffs exist if and only if the partition forms a linear crosssection of a Voronoi diagram—that is, if the partition forms a power diagram—a stronger condition than convexity. In both cases, the payoffs can be fully characterized essentially as weighted averages of base functions. Preliminary versions appear in the proceedings of the 9 th and 10 th ACM Conference on Electronic
2004: Assessing the skill of yes/no forecasts for Markov observations
 17th Conf. on Probability and Statistics in the Atmospheric Sciences
"... introduced a new, easytocalculate economic skill score for use in yes/no forecast decisions, of which precipitation forecast decisions are an example. The advantage of this new climate skill score is that the sampling distribution is known, which allows one to perform hypothesis tests on collecti ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
introduced a new, easytocalculate economic skill score for use in yes/no forecast decisions, of which precipitation forecast decisions are an example. The advantage of this new climate skill score is that the sampling distribution is known, which allows one to perform hypothesis tests on collections of forecasts and to say whether a given skill score is significant or not. Skill, as ever, is defined as improvement over an optimal naive prediction. We show that the optimal naive prediction depends on both the base rate (the climatology) of the event being forecasted, and the loss one would incur if one were to make an incorrect decision based on the forecast. Here, we take the climate skill score and extend it to the case where the predicted series is firstorder Markov in nature, of which, again, precipitation occurrence series are an example. We show that Markov skill is different and more demanding than is persistence skill. Persistence skill is defined as improvement over forecasts which state that the next value in a series will equal the present value. We also define the optimal naive prediction in the Markov case. Surprisingly, it turns out that the form of the Markov skill score is identical to the climate skill score, making calculations simple. The distribution of the Markov skill is more complex than is the distribution of the climate skill score, however. The distribution for the Markov skill score is presented, and examples of hypothesis testing for precipitation forecasts are given. We graph these skill scores for a wide range of forecastuser loss functions, a process which makes their interpretation simple. 1.