Results 1  10
of
146
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 317 (66 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Strictly Proper Scoring Rules, Prediction, and Estimation
, 2007
"... Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he ..."
Abstract

Cited by 143 (17 self)
 Add to MetaCart
Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distribution F if he or she issues the probabilistic forecast F, rather than G ̸ = F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the Savage representation. Examples of scoring rules for probabilistic forecasts in the form of predictive densities include the logarithmic, spherical, pseudospherical, and quadratic scores. The continuous ranked probability score applies to probabilistic forecasts that take the form of predictive cumulative distribution functions. It generalizes the absolute error and forms a special case of a new and very general type of score, the energy score. Like many other scoring rules, the energy score admits a kernel representation in terms of negative definite functions, with links to inequalities of Hoeffding type, in both univariate and multivariate settings. Proper scoring rules for quantile and interval forecasts are also discussed. We relate proper scoring rules to Bayes factors and to crossvalidation, and propose a novel form of crossvalidation known as randomfold crossvalidation. A case study on probabilistic weather forecasts in the North American Pacific Northwest illustrates the importance of propriety. We note optimum score approaches to point and quantile
SCHAPIRE: Adaptive game playing using multiplicative weights
 Games and Economic Behavior
, 1999
"... We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the mult ..."
Abstract

Cited by 134 (14 self)
 Add to MetaCart
We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicativeweight methods of Littlestone and Warmuth, is analyzed using the Kullback–Liebler divergence. This analysis yields a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game. A variant of our gameplaying algorithm is proved to be optimal in a very strong sense. Journal of Economic Literature
Assessment and Propagation of Model Uncertainty
, 1995
"... this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the ..."
Abstract

Cited by 108 (0 self)
 Add to MetaCart
this paper I discuss a Bayesian approach to solving this problem that has long been available in principle but is only now becoming routinely feasible, by virtue of recent computational advances, and examine its implementation in examples that involve forecasting the price of oil and estimating the chance of catastrophic failure of the U.S. Space Shuttle.
A Game of Prediction with Expert Advice
 Journal of Computer and System Sciences
, 1997
"... We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, u ..."
Abstract

Cited by 106 (7 self)
 Add to MetaCart
We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, under weak regularity, the learner can ensure that his cumulative loss never exceeds cL+ a ln n, where c and a are some constants, n is the size of the pool, and L is the cumulative loss incurred by the best expert in the pool. We find the set of those pairs (c; a) for which this is true.
Benchmark Priors for Bayesian Model Averaging
 FORTHCOMING IN THE JOURNAL OF ECONOMETRICS
, 2001
"... In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, “diffuse” priors on modelspecific parameters can lead to quite unexpected consequ ..."
Abstract

Cited by 94 (5 self)
 Add to MetaCart
In contrast to a posterior analysis given a particular sampling model, posterior model probabilities in the context of model uncertainty are typically rather sensitive to the specification of the prior. In particular, “diffuse” priors on modelspecific parameters can lead to quite unexpected consequences. Here we focus on the practically relevant situation where we need to entertain a (large) number of sampling models and we have (or wish to use) little or no subjective prior information. We aim at providing an “automatic” or “benchmark” prior structure that can be used in such cases. We focus on the Normal linear regression model with uncertainty in the choice of regressors. We propose a partly noninformative prior structure related to a Natural Conjugate gprior specification, where the amount of subjective information requested from the user is limited to the choice of a single scalar hyperparameter g0j. The consequences of different choices for g0j are examined. We investigate theoretical properties, such as consistency of the implied Bayesian procedure. Links with classical information criteria are provided. More importantly, we examine the finite sample implications of several choices of g0j in a simulation study. The use of the MC3 algorithm of Madigan and York (1995), combined with efficient coding in Fortran, makes it feasible to conduct large simulations. In addition to posterior criteria, we shall also compare the predictive performance of different priors. A classic example concerning the economics of crime will also be provided and contrasted with results in the literature. The main findings of the paper will lead us to propose a “benchmark” prior specification in a linear regression context with model uncertainty.
Asymptotic calibration
 Biometrika
, 1998
"... Can we forecast the probability of an arbitrary sequence of events happening so that the stated probability of an event happening is close to its empirical probability? We can view this prediction problem as a game played against nature, where at the beginning of the game Nature picks a data sequenc ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
Can we forecast the probability of an arbitrary sequence of events happening so that the stated probability of an event happening is close to its empirical probability? We can view this prediction problem as a game played against nature, where at the beginning of the game Nature picks a data sequence and the forecaster picks a forecasting algorithm. If the forecaster is not allowed to randomize, then Nature win; there will always be data for which the forecaster does poorly. This paper shows that, if the forecaster can randomize, the forecaster wins in the sense that the forecasted probabilities and the empirical probabilities can be made arbitrarily close to each other.
Multivariate Density Forecast Evaluation and Calibration
 in Financial Risk Management: HighFrequency Returns on Foreign Exchange,” Review of Economics and Statistics
, 1999
"... educational and research purposes, so long as it is not altered, this copyright notice is reproduced with it, and it is not sold for profit. Abstract: We provide a framework for evaluating and improving multivariate density forecasts. Among other things, the multivariate framework lets us evaluate t ..."
Abstract

Cited by 72 (15 self)
 Add to MetaCart
educational and research purposes, so long as it is not altered, this copyright notice is reproduced with it, and it is not sold for profit. Abstract: We provide a framework for evaluating and improving multivariate density forecasts. Among other things, the multivariate framework lets us evaluate the adequacy of density forecasts involving crossvariable interactions, such as timevarying conditional correlations. We also provide conditions under which a technique of density forecast “calibration ” can be used to improve deficient density forecasts. Finally, motivated by recent advances in financial risk management, we provide a detailed application to multivariate highfrequency exchange rate density forecasts.
Dynamic Conditional Independence Models And Markov Chain Monte Carlo Methods
 Journal of the American Statistical Association
, 1997
"... In dynamic statistical modeling situations, observations arise sequentially, causing the model to expand by progressive incorporation of new data items and new unknown parameters. For example, in clinical monitoring, new patientspecific parameters are introduced with each new patient. Markov chain ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
In dynamic statistical modeling situations, observations arise sequentially, causing the model to expand by progressive incorporation of new data items and new unknown parameters. For example, in clinical monitoring, new patientspecific parameters are introduced with each new patient. Markov chain Monte Carlo (MCMC) might be used for posterior inference, but would need to be redone at each expansion stage. Thus such methods are often too slow for realtime implementation. By combining MCMC with importanceresampling, we show how realtime posterior updating can be effected. The proposed dynamic sampling algorithms utilize posterior samples from previous expansion stages, and exploit conditional independence between groups of parameters to allow samples of parameters no longer of interest to be discarded, such as when a patient dies or is discharged. We apply the methods to monitoring of heart transplant recipients during infection from cytomegalovirus. KEY WORDS : Bayesian Inference, ...