Results 1  10
of
229
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract

Cited by 151 (17 self)
 Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated ..."
Abstract

Cited by 99 (4 self)
 Add to MetaCart
Accurate, wellcalibrated estimates of class membership probabilities are needed in many supervised learning applications, in particular when a costsensitive decision must be made about examples with exampledependent costs. This paper presents simple but successful methods for obtaining calibrated probability estimates from decision tree and naive Bayesian classifiers. Using the large and challenging KDD'98 contest dataset as a testbed, we report the results of a detailed experimental comparison of ten methods, according to four evaluation measures. We conclude that binning succeeds in significantly improving naive Bayesian probability estimates, while for improving decision tree probability estimates, we recommend smoothing by estimation and a new variant of pruning that we call curtailment.
The CEO Problem
 IEEE Trans. Inform. Theory
, 1996
"... automated diagnosis, selfhealing and selfmonitoring systems, statistical induction and ..."
Abstract

Cited by 87 (3 self)
 Add to MetaCart
automated diagnosis, selfhealing and selfmonitoring systems, statistical induction and
Forecast Evaluation and Combination
 IN G.S. MADDALA AND C.R. RAO (EDS.), HANDBOOK OF STATISTICS
, 1996
"... It is obvious that forecasts are of great importance and widely used in economics and finance. Quite simply, good forecasts lead to good decisions. The importance of forecast evaluation and combination techniques follows immediately forecast users naturally have a keen interest in monitoring and ..."
Abstract

Cited by 84 (24 self)
 Add to MetaCart
It is obvious that forecasts are of great importance and widely used in economics and finance. Quite simply, good forecasts lead to good decisions. The importance of forecast evaluation and combination techniques follows immediately forecast users naturally have a keen interest in monitoring and improving forecast performance. More generally, forecast evaluation figures prominently in many questions in empirical economics and finance, such as: Are expectations rational? (e.g., Keane and Runkle, 1990; Bonham and Cohen, 1995) Are financial markets efficient? (e.g., Fama, 1970, 1991) Do macroeconomic shocks cause agents to revise their forecasts at all horizons, or just at short and mediumterm horizons? (e.g., Campbell and Mankiw, 1987; Cochrane, 1988) Are observed asset returns "too volatile"? (e.g., Shiller, 1979; LeRoy and Porter, 1981) Are asset returns forecastable over long horizons? (e.g., Fama and French, 1988; Mark, 1995)
Game Theory, Maximum Entropy, Minimum Discrepancy And Robust Bayesian Decision Theory
 ANNALS OF STATISTICS
, 2004
"... ..."
Logarithmic Market Scoring Rules for Modular Combinatorial Information Aggregation
 Journal of Prediction Markets
, 2002
"... In practice, scoring rules elicit good probability estimates from individuals, while betting markets elicit good consensus estimates from groups. Market scoring rules combine these features, eliciting estimates from individuals or groups, with groups costing no more than individuals. ..."
Abstract

Cited by 74 (5 self)
 Add to MetaCart
In practice, scoring rules elicit good probability estimates from individuals, while betting markets elicit good consensus estimates from groups. Market scoring rules combine these features, eliciting estimates from individuals or groups, with groups costing no more than individuals.
Asymptotic calibration
, 1998
"... Can we forecast the probability of an arbitrary sequence of events happening so that the stated probability of an event happening is close to its empirical probability? We can view this prediction problem as a game played against Nature, where at the beginning of the game Nature picks a data sequenc ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
Can we forecast the probability of an arbitrary sequence of events happening so that the stated probability of an event happening is close to its empirical probability? We can view this prediction problem as a game played against Nature, where at the beginning of the game Nature picks a data sequence and the forecaster picks a forecasting algorithm. If the forecaster is not allowed to randomise, then Nature wins; there will always be data for which the forecaster does poorly. This paper shows that, if the forecaster can randomise, the forecaster wins in the sense that the forecasted probabilities and the empirical probabilities can be made arbitrarily close to each other.
A Hybrid Ensemble Kalman Filter / 3DVariational Analysis Scheme
"... A hybrid 3dimensional variational (3DVar) / ensemble Kalman filter analysis scheme is demonstrated using a quasigeostrophic model under perfectmodel assumptions. Four networks with differing observational densities are tested, including one network with a data void. The hybrid scheme operates by ..."
Abstract

Cited by 60 (15 self)
 Add to MetaCart
A hybrid 3dimensional variational (3DVar) / ensemble Kalman filter analysis scheme is demonstrated using a quasigeostrophic model under perfectmodel assumptions. Four networks with differing observational densities are tested, including one network with a data void. The hybrid scheme operates by computing a set of parallel data assimilation cycles, with each member of the set receiving unique perturbed observations. The perturbed observations are generated by adding random noise consistent with observation error statistics to the control set of observations. Background error statistics for the data assimilation are estimated from a linear combination of timeinvariant 3DVar covariances and flowdependent covariances developed from the ensemble of shortrange forecasts. The hybrid scheme allows the user to weight the relative contributions of the 3DVar and ensemblebased background covariances. The analysis scheme was cycled for 90 days, with new observations assimilated every 12 h...
Interpretation Of Rank Histograms For Verifying Ensemble Forecasts
, 2000
"... Rank histograms are a tool for evaluating ensemble forecasts. They are useful for determining the reliability of ensemble forecasts and for diagnosing errors in its mean and spread. Rank histograms are generated by repeatedly tallying the rank of the verification (usually, an observation) relative t ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
Rank histograms are a tool for evaluating ensemble forecasts. They are useful for determining the reliability of ensemble forecasts and for diagnosing errors in its mean and spread. Rank histograms are generated by repeatedly tallying the rank of the verification (usually, an observation) relative to values from an ensemble sorted from lowest to highest. However, an uncritical use of the rank histogram can lead to misinterpretations of the qualities of that ensemble. For example, a flat rank histogram, ususally taken as a sign of reliability, can still be generated from unreliable ensembles. Similarly, a Ushaped rank histogram, commonly understood as indicating a lack of variability in the ensemble, can also be a sign of conditional bias. It is also shown that flat rank histograms can be generated for some model variables if the variance of the ensemble is correctly specified, yet if covariances between model grid points are improperly specified, rank histograms for combinations of mo...
Ensembles of models for automated diagnosis of system performance problems
 In DSN
, 2005
"... Violations of service level objectives (SLO) in Internet services are urgent conditions requiring immediate attention. Previously we showed [1] that TreeAugmented Bayesian Networks or TAN models are effective at identifying which lowlevel system properties were correlated to highlevel SLO violati ..."
Abstract

Cited by 46 (9 self)
 Add to MetaCart
Violations of service level objectives (SLO) in Internet services are urgent conditions requiring immediate attention. Previously we showed [1] that TreeAugmented Bayesian Networks or TAN models are effective at identifying which lowlevel system properties were correlated to highlevel SLO violations (the metric attribution problem) under stable workloads. In this paper we extend our approach to adapt to changing workloads and external disturbances by maintaining an ensemble of probabilistic models, adding new models when existing ones do not accurately capture current system behavior. Using realistic workloads on an implemented prototype system, we show that the ensemble of TAN models captures the performance behavior of the system accurately under changing workloads and conditions. We fuse diagnoses from the ensemble of models to identify likely causes of the performance problem, with results comparable to those produced by an oracle that continuously changes the model based on advance knowledge of the workload. The cost of inducing new models and managing the ensembles is negligible, making our approach both immediately practical and theoretically appealing.