Results 1 - 10
of
18
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
A tutorial introduction to the minimum description length principle
- in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."
Probabilistic forecasts, calibration and sharpness
- Journal of the Royal Statistical Society Series B
, 2007
"... Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive dis ..."
Abstract
-
Cited by 24 (11 self)
- Add to MetaCart
Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences
, 1999
"... This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource-bounded complexity. We also consider a new type of complexity--- statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexit ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource-bounded complexity. We also consider a new type of complexity--- statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity, A. N. Kolmogorov's leading motive was developing on its basis a mathematical theory more adequately substantiating applications of probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of a random object from its complexity characteristics without use of the notion of probability. In the first part of this paper we present several results in this direction. Though the subsequent development of algorithmic complexity and randomness was different, algorithmic complexity has successful applications in a traditional probabilistic framework. In the second part of the paper we consider applications to the estimation of parameters and the definition of Bernoulli sequences. All considerations have finite combinatorial character. 1.
On Optimal Sequential Prediction for General Processes
- IEEE Transactions on Information Theory
, 2001
"... In the stochastic sequential prediction problem, the elements of a random process X 1 , X 2 , ... 2 R are successively revealed to a forecaster. At each time t the forecaster makes a prediction F t of X t based only on X 1 , ..., X t 1 , when X t is revealed, the forecaster incurs a loss `(F t , X t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In the stochastic sequential prediction problem, the elements of a random process X 1 , X 2 , ... 2 R are successively revealed to a forecaster. At each time t the forecaster makes a prediction F t of X t based only on X 1 , ..., X t 1 , when X t is revealed, the forecaster incurs a loss `(F t , X t ). This paper considers several aspects of the sequential prediction problem for unbounded, non-stationary processes under p-th power loss , 1 < p < 1. In the first part of the paper it is shown that Bayes prediction schemes are Cesaro optimal under general conditions, that Cesaro optimal prediction schemes are unique in a natural sense, and that Cesaro optimality is equivalent to a form of weak calibration. Extensions of the existence and uniqueness results to generalized prediction, and prediction from observations with additive noise, are established.
Handling Uncertainty When You're Handling Uncertainty: Model Selection and Error Bars for Belief Networks
, 2000
"... Belief networks are a common way of handling uncertainty in AI. A belief network represents the joint distribution of a set of random variables. When network parameters are estimated from a sample, the parameter values are also random variables whose distribution is given by the sampling distributio ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Belief networks are a common way of handling uncertainty in AI. A belief network represents the joint distribution of a set of random variables. When network parameters are estimated from a sample, the parameter values are also random variables whose distribution is given by the sampling distribution of the true model (Frequentist perspective) or the posterior distribution over the parameter space (Bayesian perspective). The uncertainty in parameter values has implications for both inference and learning. In learning network structure from data, a fundamental issue is how to handle the bias-variance trade-off -- increasing model complexity decreases bias but increases the variance in parameter values. We compare model selection criteria for handling the bias-variance trade-off in structure learning, on theoretical and empirical grounds. We also look at the issue of the uncertainty in belief network inference. Once constructed, belief networks are typically used to answer queries about mar...
On Optimal Sequential Decisions Schemes for General Processes
- IEEE Transactions on Information Theory
, 2000
"... In the stochastic sequential decision problem the elements of a random process X 1 , X 2 , ... are successively revealed to a decision scheme. At each time t 1 the scheme takes an action F t based on the observed values of X 1 , ..., X t 1 : when X t is revealed, the scheme incurs loss `(F t , X t ) ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In the stochastic sequential decision problem the elements of a random process X 1 , X 2 , ... are successively revealed to a decision scheme. At each time t 1 the scheme takes an action F t based on the observed values of X 1 , ..., X t 1 : when X t is revealed, the scheme incurs loss `(F t , X t ). The first part of the paper is devoted to some basic properties of Cesaro and strongly optimal decision schemes for general processes and strictly convex loss functions. It is shown in each case that optimal schemes are unique in a natural sense, and that optimality is equivalent to a form of calibration. For binary processes it is shown that thresholding an optimal prediction scheme for the squared loss yields an optimal binary prediction scheme for the Hamming loss. In the second part of the paper it is shown how to construct, from a countable family of candidate decision schemes, a single composite scheme whose asymptotic performance is as good as that of any member of the family. ...
A Bayesian Technique for Estimating the Credibility of Question Answerers
"... We address the problem of ranking question answerers according to their credibility, characterized here by the probability that a given question answerer (user) will be awarded a best answer on a question given the answerer’s question-answering history. This probability (represented by θ) is conside ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We address the problem of ranking question answerers according to their credibility, characterized here by the probability that a given question answerer (user) will be awarded a best answer on a question given the answerer’s question-answering history. This probability (represented by θ) is considered to be a hidden variable that can only be estimated statistically from specific observations associated with the user, namely the number b of best answers awarded, associated with the number n of questions answered. The more specific problem addressed is the potentially high degree of uncertainty associated with such credibility estimates when they are based on small numbers of answers. We address this problem by a kind of Bayesian smoothing. The credibility estimate will consist of a mixture of the overall population statistics and those of the specific user. The greater the number of questions asked, the greater will be the contribution of the specific user statistics relative to those of the overall population. We use the Predictive Stochastic Complexity (PSC) as an accuracy measure to evaluate several methods that can be used for the estimation. We compare our technique (Bayesian Smoothing (BS)) with maximum a priori (MAP) estimation, maximum likelihood (ML) estimation and Laplace smoothing. 1

