Results 1  10
of
101
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
Probabilistic forecasts, calibration and sharpness
 Journal of the Royal Statistical Society Series B
, 2007
"... Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive dis ..."
Abstract

Cited by 38 (15 self)
 Add to MetaCart
Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with crossvalidation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
The influence limiter: Provably manipulationresistant recommender systems
 In To appear in Proceedings of the ACM Recommender Systems Conference (RecSys07
, 2007
"... This appendix should be read in conjunction with the article by Resnick and Sami [1]. Here, we include the proofs that were omitted from the main article due to shortage of space. A.1 Lemma 5 Lemma 5: For the quadratic scoring rule (MSE) loss, for all q,u ∈ [0,1], GF(qu) ≥ D(qu) 2. Proof of Lem ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
This appendix should be read in conjunction with the article by Resnick and Sami [1]. Here, we include the proofs that were omitted from the main article due to shortage of space. A.1 Lemma 5 Lemma 5: For the quadratic scoring rule (MSE) loss, for all q,u ∈ [0,1], GF(qu) ≥ D(qu) 2. Proof of Lemma 5: Because both D(qu) = D(1 − q1 − u) and GF(qu) = GF(1 − q1 − u), we can assume u ≥ q without loss of generality. Keeping q fixed, we want to show that the result holds for all u. Note that D(qq) = GF(qq) = 0. Thus, differentiating with respect to u, it is sufficient to prove that GF ′ (qu) ≥ D ′ (qu)/2 for all u ≥ q,u ≤ 1. We change variables by setting y = u − q. We use the notation D ′ (y) to denote D ′ (qu)u=q+y, treating q as fixed and implicit. Likewise, we use the notation GF ′ (y). For brevity, we use q to denote (1 − q). D(qu) = q[(q − y) 2 − q 2]+q[(q+y) 2 − q 2] = q[y 2 − 2yq]+q[y 2 + 2qy] = y 2 ⇒ D ′ (y) = 2y 1 GF(qu) = qlog(1+y 2 − 2qy)+qlog(1+y 2 + 2qy)
A tutorial on conformal prediction
 Journal of Machine Learning Research
, 2008
"... Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ε, together with a method that makes a prediction ˆy of a label y, it produces a set of labels, typically containing ˆy, that also contains y with probability 1 − ε. Con ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
Conformal prediction uses past experience to determine precise levels of confidence in new predictions. Given an error probability ε, together with a method that makes a prediction ˆy of a label y, it produces a set of labels, typically containing ˆy, that also contains y with probability 1 − ε. Conformal prediction can be applied to any method for producing ˆy: a nearestneighbor method, a supportvector machine, ridge regression, etc. Conformal prediction is designed for an online setting in which labels are predicted successively, each one being revealed before the next is predicted. The most novel and valuable feature of conformal prediction is that if the successive examples are sampled independently from the same distribution, then the successive predictions will be right 1 − ε of the time, even though they are based on an accumulating data set rather than on independent data sets. In addition to the model under which successive examples are sampled independently, other online compression models can also use conformal prediction. The widely used Gaussian linear model is one of these. This tutorial presents a selfcontained account of the theory of conformal prediction and works through several numerical examples. A more comprehensive treatment of the topic is provided in
Information, Divergence and Risk for Binary Experiments
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2009
"... We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We unify fdivergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROCcurves and statistical information. We do this by systematically studying integral and variational representations of these various objects and in so doing identify their primitives which all are related to costsensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating fdivergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants.
A New Understanding of Subjective Probability and Its Generalization to Lower and Upper Prevision
, 2002
"... This article introduces a new wa of understanding subjective probabilit and its generalization to lower and upper prevision. ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
This article introduces a new wa of understanding subjective probabilit and its generalization to lower and upper prevision.
Defensive Forecasting
"... We consider how to make probability forecasts of binary labels. Our main mathematical result is that for any continuous gambling strategy used for detecting disagreement between the forecasts and the actual labels, there exists a forecasting strategy whose forecasts are ideal as far as this ga ..."
Abstract

Cited by 13 (12 self)
 Add to MetaCart
We consider how to make probability forecasts of binary labels. Our main mathematical result is that for any continuous gambling strategy used for detecting disagreement between the forecasts and the actual labels, there exists a forecasting strategy whose forecasts are ideal as far as this gambling strategy is concerned. A forecasting strategy obtained in this way from a gambling strategy demonstrating a strong law of large numbers is simplified and studied empirically.
Continuoustime trading and emergence of volatility
, 2007
"... This note continues investigation of randomnesstype properties emerging in idealized financial markets with continuous price processes. It is shown, without making any probabilistic assumptions, that the strong variation exponent of nonconstant price processes has to be 2, as in the case of contin ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
This note continues investigation of randomnesstype properties emerging in idealized financial markets with continuous price processes. It is shown, without making any probabilistic assumptions, that the strong variation exponent of nonconstant price processes has to be 2, as in the case of continuous martingales. 1