Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
Abstract

Cited by 143 (17 self)
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
Confirmation Bias: A Ubiquitous Phenomenon in Many Guises
 Review of General Psychology
, 1998
Abstract

Cited by 101 (0 self)
Confirmation bias, as the term is typically used in the psychological literature, connotes the seeking or interpreting of evidence in ways that are partial to existing beliefs, expectations, or a hypothesis in hand. The author reviews evidence of such a bias in a variety of guises and gives examples of its operation in several practical contexts. Possible explanations are considered, and the question of its utility or disutility is discussed. When men wish to construct or support a theory, how they torture facts into their service! (Mackay, 1852/ 1932, p. 552) Confirmation bias is perhaps the best known and most widely accepted notion of inferential error to come out of the literature on human reasoning. (Evans, 1989, p. 41) If one were to attempt to identify a single problematic aspect of human reasoning that deserves attention above all others, the confirmation bias would have to be among the candidates for consideration. Many have written about this bias, and it appears to be sufficiently strong and pervasive that one is led to wonder whether the bias, by itself, might account for a significant fraction of the disputes, altercations, and misunderstandings that occur among individuals, groups, and nations. Confirmation bias has been used in the psychological literature to refer to a variety of phenomena. Here I take the term to represent a generic concept that subsumes several more specific ideas that connote the inappropriate bolstering of hypotheses or beliefs whose truth is in question.
System Identification, Approximation and Complexity
 International Journal of General Systems
, 1977
Abstract

Cited by 34 (23 self)
This paper is concerned with establishing broadlybased systemtheoretic foundations and practical techniques for the problem of system identification that are rigorous, intuitively clear and conceptually powerful. A general formulation is first given in which two order relations are postulated on a class of models: a constant one of complexity; and a variable one of approximation induced by an observed behaviour. An admissible model is such that any less complex model is a worse approximation. The general problem of identification is that of finding the admissible subspace of models induced by a given behaviour. It is proved under very general assumptions that, if deterministic models are required then nearly all behaviours require models of nearly maximum complexity. A general theory of approximation between models and behaviour is then developed based on subjective probability concepts and semantic information theory The role of structural constraints such as causality, locality, finite memory, etc., are then discussed as rules of the game. These concepts and results are applied to the specific problem or stochastic automaton, or grammar, inference. Computational results are given to demonstrate that the theory is complete and fully operational. Finally the formulation of identification proposed in this paper is analysed in terms of Klir’s epistemological hierarchy and both are discussed in terms of the rich philosophical literature on the acquisition of knowledge. 1
A dynamic parimutuel market for hedging, wagering, and information aggregation
 In Proceedings of the Fifth ACM Conference on Electronic Commerce (EC’04
, 2004
Abstract

Cited by 34 (7 self)
I develop a new mechanism for risk allocation and information speculation called a dynamic parimutuel market (DPM). A DPM acts as hybrid between a parimutuel market and a continuous double auction (CDA), inheriting some of the advantages of both. Like a parimutuel market, a DPM offers infinite buyin liquidity and zero risk for the market institution; like a CDA, a DPM can continuously react to new information, dynamically incorporate information into prices, and allow traders to lock in gains or limit losses by selling prior to event resolution. The trader interface can be designed to mimic the familiar double auction format with bidask queues, though with an addition variable called the payoff per share. The DPM price function can be viewed as an automated market maker always offering to sell at some price, and moving the price appropriately according to demand. Since the mechanism is parimutuel (i.e., redistributive), it is guaranteed to pay out exactly the amount of money taken in. I explore a number of variations on the basic DPM, analyzing the properties of each, and solving in closed form for their respective price functions.
Extracting Collective Probabilistic Forecasts from Web Games
, 2001
Abstract

Cited by 30 (10 self)
Game sites on the World Wide Web draw people from around the world with specialized interests, skills, and knowledge.
Betting BooleanStyle: A Framework for Trading in Securities Based on Logical Formulas
, 2003
Abstract

Cited by 30 (17 self)
We develop a framework for trading in compound securities: financial instruments that pay off contingent on the outcomes of arbitrary statements in propositional logic. Buying or selling securities  which can be thought of as betting on or against a particular future outcome  allows agents both to hedge risk and to profit (in expectation) on subjective predictions. A compound securities market allows agents to place bets on arbitrary boolean combinations of events, enabling them to more closely achieve their optimal risk exposure, and enabling the market as a whole to more closely achieve the social optimum. The tradeoff for allowing such expressivity is in the complexity of the agents' and auctioneer's optimization problems.
Information incorporation in online inGame sports betting markets
 ELECTRONIC COMMERCE
, 2003
Abstract

Cited by 28 (9 self)
We analyze data from $52$ online ingame sports betting markets (where betting is allowed continuously throughout a game), including 34 markets based on soccer (European football) games from the 2002 World Cup, and 18 basketball games from the 2002 USA National Basketball Association (NBA) championship. We show that prices on average approach the correct outcome over time, and the price dynamics in the markets are closely coupled with game events, agreeing with efficient market assumptions. We also examine qualitative distinctions between the two types of games.
du Preez, “Applicationindependent evaluation of speaker detection
 Computer Speech and Language
, 2006
Abstract

Cited by 28 (2 self)
We present a Bayesian analysis of the evaluation of speaker detection performance. We use expectation of utility to confirm that likelihoodratio is both an optimum and applicationindependent form of output for speaker detection systems. We point out that the problem of likelihoodratio calculation is equivalent to the problem of optimization of decision thresholds. It is shown that the decision cost that is used in the existing NIST evaluations effectively forms a utility (a proper scoring rule) for the evaluation of the quality of likelihoodratio presentation. As an alternative, a logarithmic utility (a strictly proper scoring rule) is proposed. Finally, an informationtheoretic interpretation of the expected logarithmic utility is given. It is hoped that this analysis and the proposed evaluation method will promote the use of likelihoodratio detector output rather than decision output. 1.
Evaluating and combining subjective probability estimates
 Journal of Behavioral Decision Making
, 1997
Abstract

Cited by 18 (4 self)
This paper concerns the evaluation and combination of subjective probability estimates for categorical events. We argue that the appropriate criterion for evaluating individual and combined estimates depends on the type of uncertainty the decision maker seeks to represent, which in turn depends on his or her model of the event space. Decision makers require accurate estimates in the presence of aleatory uncertainty about exchangeable events, diagnostic estimates given epistemic uncertainty about unique events, and some combination of the two when the events are not necessarily unique, but the best equivalence class de®nition for exchangeable events is not apparent. Following a brief reveiw of the mathematical and empirical literature on combining judgments, we present an approach to the topic that derives from (1) a weak cognitive model of the individual that assumes subjective estimates are a function of underlying judgment perturbed by random error and (2) a classi®cation of judgment contexts in terms of the underlying information structure. In support of our developments, we present new analyses of two sets of subjective probability estimates, one of exchangeable and the other of unique events. As predicted, mean estimates were more accurate than the individual values in the ®rst case and more diagnostic in
An empirical comparison of algorithms for aggregating expert predictions
 In UAI
, 2006
Abstract

Cited by 13 (3 self)
Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts ’ predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert’s prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms. 1