Results 1 - 10
of
13
How to Use Expert Advice
- JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract
-
Cited by 267 (60 self)
- Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Learning by Transduction
- In Uncertainty in Artificial Intelligence
, 1998
"... We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object /classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-v ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object /classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicable measure of the evidence found in support of that prediction. We also describe a procedure for assigning degrees of confidence to predictions made by the support vector machine. Some experimental results are presented, and possible extensions of the algorithms are discussed. 1 THE PROBLEM Suppose labeled points (x i ; y i ) (i = 1; 2; : : :), where x i 2 IR n (our objects are specified by n real-valued attributes) and y i 2 f\Gamma1; 1g, are generated independently from an unknown (but the same for all points) probability distribution. We are given l points x i , i = 1; : : : ; l, toge...
Competitive on-line statistics
- International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on-line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on-line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive on-line statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive on-line statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive on-line algorithms, linear regression, prequential statistics, worst-case analysis.
Machine-Learning Applications of Algorithmic Randomness
- In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we com ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...
Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences
, 1999
"... This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource-bounded complexity. We also consider a new type of complexity--- statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexit ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource-bounded complexity. We also consider a new type of complexity--- statistical complexity closely related to mathematical statistics. Unlike other discoverers of algorithmic complexity, A. N. Kolmogorov's leading motive was developing on its basis a mathematical theory more adequately substantiating applications of probability theory, mathematical statistics and information theory. Kolmogorov wanted to deduce properties of a random object from its complexity characteristics without use of the notion of probability. In the first part of this paper we present several results in this direction. Though the subsequent development of algorithmic complexity and randomness was different, algorithmic complexity has successful applications in a traditional probabilistic framework. In the second part of the paper we consider applications to the estimation of parameters and the definition of Bernoulli sequences. All considerations have finite combinatorial character. 1.
Testing exchangeability on-line
- Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... praktiqeskie vyvody teorii vero�tnoste� mogut bytь obosnovany v kaqestve sledstvi� gipotez o predelьno� pri dannyh ograniqeni�h sloжnosti izuqaemyh �vleni� ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
praktiqeskie vyvody teorii vero�tnoste� mogut bytь obosnovany v kaqestve sledstvi� gipotez o predelьno� pri dannyh ograniqeni�h sloжnosti izuqaemyh �vleni�
Mathematical foundations for probability and causality
- In Mathematical Aspects of Artificial Intelligence. Providence, Rhode Island: American Mathematical Society
, 1997
"... ABSTRACT. Event trees, and more generally, event spaces, can be used to provide a foundation for mathematical probability that includes a systematic understanding of causality. This foundation justifies the use of statistics in causal investigation and provides a rigorous semantics for causal reason ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
ABSTRACT. Event trees, and more generally, event spaces, can be used to provide a foundation for mathematical probability that includes a systematic understanding of causality. This foundation justifies the use of statistics in causal investigation and provides a rigorous semantics for causal reasoning. Causal reasoning, always important in applied statistics and increasingly important in artificial intelligence, has never been respectable in mathematical treatments of probability. But, as this article shows, a home can be made for causal reasoning in the very foundations of mathematical probability. The key is to bring the event tree, basic to the thinking of Pascal, Huygens, and other pioneers of probability, back into probability’s foundations. An event tree represents the possibilities for the step-by-step evolution of an observer’s knowledge. If that observer is nature, then the steps in the tree are causes. If we add branching probabilities, we obtain a probability tree, which can express nature’s limited ability to predict the effects of causes. As a foundation for the statistical investigation of causality, event and probability trees provide a language for causal explanation, which gives rigorous meaning to causal claims and clarifies the relevance of different kinds of evidence to those claims. As a foundation for probability theory, they allow an elementary treatment of martingales,
Pricing European Options Without Probability
, 1995
"... It is well known that in the case where the stock price S t is governed by the equation dS t =S t = dt + oedW t , any European option satisfying weak regularity conditions has a fair price (the Black---Scholes formula and its generalizations). We consider the case where no probabilistic assumptions ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
It is well known that in the case where the stock price S t is governed by the equation dS t =S t = dt + oedW t , any European option satisfying weak regularity conditions has a fair price (the Black---Scholes formula and its generalizations). We consider the case where no probabilistic assumptions are made about S t ; instead, we assume that the derivative security D which pays a dividend of (dS t =S t ) 2 (the squared relative increase in the price of S t ) each instant dt is traded in the market. We prove that the "regular" European options have fair prices provided that both S t and D t (the price process of D) are continuous and the fractal dimensions of the graphs of S t and D t satisfy certain inequalities. Intuitively our assumptions are much weaker than the usual assumption dS t =S t = dt + oedW t . Key Words: Black---Scholes formula, fractal dimension, pathwise stochastic integral, nonstandard analysis The final version of this paper was prepared for the seminar on the f...

