## Machine-Learning Applications of Algorithmic Randomness (1999)

Venue: | In Proceedings of the Sixteenth International Conference on Machine Learning |

Citations: | 23 - 13 self |

### BibTeX

@INPROCEEDINGS{Vovk99machine-learningapplications,

author = {Volodya Vovk and Alex Gammerman and Craig Saunders},

title = {Machine-Learning Applications of Algorithmic Randomness},

booktitle = {In Proceedings of the Sixteenth International Conference on Machine Learning},

year = {1999},

pages = {444--453},

publisher = {Morgan Kaufmann}

}

### OpenURL

### Abstract

Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...

### Citations

9811 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...cannot exclude the wrong classification if the data sequence is random and the time n is also random: the wrong sequence (8) also has a random continuation. 6.2 DENSITY ESTIMATION According to Vapnik =-=[20]-=- (1995) (see also Vapnik [21], 1998) there are three main problems of statistical learning theory: pattern recognition; regression estimation; density estimation. As we have seen earlier, the problem ... |

1777 | An introduction to Kolmogorov complexity and its applications
- Li, Vitányi
- 1997
(Show Context)
Citation Context ...roximation Principle [24]), which have many practical applications. (It should be noted, however, that algorithmic randomness has been used in the discussions of the MDL principle; see Li and Vitanyi =-=[13]-=-, 1997, and [12], 1995.) The main advantage of Kolmogorov's notion of randomness in comparison with the earlier definitions (eg, von Mises's) is that it is applicable to finite sequences and that it p... |

588 |
Probability and Statistics
- DeGroot, Schervish
- 2002
(Show Context)
Citation Context ...t factor, Plottery. Remark 2 The difference between Martin-Lof and Levin randomness levels is analogous to the difference between Bayes factors and p-values. The latter is discussed in, eg, Schervish =-=[19]-=- (Section 4.6.2) and Vovk [23]. It is always possible (though not advisable in practice) to use Levin randomness level as MartinL of randomness level. Since randomness deficiencies are defined to with... |

557 |
Three approaches to the quantitative definition of information
- KOLMOGOROV
- 1965
(Show Context)
Citation Context ...ndomness. As will become clear later, the problem of assigning confidences to predictions is closely connected to the problem of defining random sequences. The latter problem was solved by Kolmogorov =-=[8]-=- (1965), who based his definition on the existence of the Universal Turing Machine (though it became clear that Kolmogorov 's definition does solve the problem of defining random sequences only after ... |

342 |
The definition of random sequences
- Martin-Löf
- 1966
(Show Context)
Citation Context ...his definition on the existence of the Universal Turing Machine (though it became clear that Kolmogorov 's definition does solve the problem of defining random sequences only after Martin-Lof's paper =-=[15]-=-, 1966); Kolmogorov's definition moved the notion of randomness from the grey area surrounding probability theory and statistics to mathematical computer science. Kolmogorov believed his notion of ran... |

196 | Handwritten digit recognition with a backpropagation network
- Cun, Boser, et al.
- 1990
(Show Context)
Citation Context ... paper are done for a simple pattern recognition problem of identifying handwritten digits using a database of US postal data of 9300 digits, where each digit is a 16 \Theta 16 vector (cf LeCun et al =-=[10]-=-, 1990). The experiments are conducted for a subset of these data (a training set of 400 examples and 100 test sets of 1 example each), and include a construction of two-class classifier to separate d... |

164 |
Grundbegriffe der Wahrscheinlichkeitsrechnung
- Kolmogorov
- 1933
(Show Context)
Citation Context ...nce. Kolmogorov believed his notion of randomness to be a suitable basis for applications of probability. Unfortunately, the fate of this idea was different from Kolmogorov 's 1933 axioms (Kolmogorov =-=[7]-=-, 1933), which are universally accepted as the basis for the theory of probability. The algorithmic notion of randomness has mainly remained of a purely mathematical interest and has not become the le... |

112 |
On the notion of random sequence
- Levin
- 1973
(Show Context)
Citation Context ...it provides degrees of randomness; this is its crucial feature which makes practical applications possible. Later Kolmogorov's definition was developed by, among others, Martin-Lof [15] (1966), Levin =-=[11]-=- (1973) and G'acs [2] (1980). The main goal of this paper is to study computable approximations to algorithmic randomness and to apply those approximations to some benchmark datasets. The main technic... |

93 |
Logical basis for information theory and probability theory
- Kolmogorov
- 1968
(Show Context)
Citation Context ...C(z j A); d L A (z) = log jAj \Gamma K(z j A); respectively. Theorem 2 If z ranges over Z , d ML exch (z) = + d ML \Xi(z) (z) and d L exch (z) = + d L \Xi(z) (z): This theorem shows that Kolmogorov's =-=[9] (1968) &q-=-uot;Bernoulli sequences" are exactly the sequences with a small permutation deficiency in the binary case. To establish a relation between randomness deficiency and permutation deficiency we will... |

50 |
Predicting f0; 1g functions on randomly drawn points
- Haussler, Littlestone, et al.
- 1994
(Show Context)
Citation Context ... 1), assuming there is only one example to be classified) with the usual results of statistical learning theory (see, eg, Vapnik [21], 1998, Chapters 4 and 10) and PAC theory (see, eg, Haussler et al =-=[6], 1994). S-=-ay, Vapnik's ([21], Theorem 10.5; Theorems 10.6 and 10.7 are of a similar form) denominator l + 1 in prob(error)sEK l+1 l + 1 corresponds to our log(l + 1) (when the "direct scale" is used).... |

41 |
Stochastic Complexity (with discussion
- Rissanen
- 1987
(Show Context)
Citation Context ..., despite the fact that the notions of Kolmogorov complexity and randomness are extremely closely connected 1 : despite being non-computable, Kolmogorov complexity inspired the MDL and MML principles =-=[16, 26]-=- (and their generalization, Complexity Approximation Principle [24]), which have many practical applications. (It should be noted, however, that algorithmic randomness has been used in the discussions... |

40 |
Nonparametric Methods in Statistics
- Fraser
- 1957
(Show Context)
Citation Context ...tive constant) . In applications, it is often more convenient to use the "direct scale". Now we will reformulate some of the previous definitions in the "direct scale". We say that=-= a function t : Z ! [0; 1]-=- is a p-value function wr to P if 1. for all n 2 N, all r 2 [0; 1] and all P 2 Pn , Pfz 2 Z n : t(z)srgsr; 2. t is semicomputable from above, in the sense that there exists a computable sequence of co... |

20 | Volodya Vovk. Ridge regression learning algorithm in dual variables - Saunders, Gammerman - 1998 |

17 |
A logic of probability, with application to the foundations of statistics (with discussion
- Vovk
- 1993
(Show Context)
Citation Context ...he difference between Martin-Lof and Levin randomness levels is analogous to the difference between Bayes factors and p-values. The latter is discussed in, eg, Schervish [19] (Section 4.6.2) and Vovk =-=[23]-=-. It is always possible (though not advisable in practice) to use Levin randomness level as MartinL of randomness level. Since randomness deficiencies are defined to within an additive constant and ra... |

15 |
Estimation and inference by compact coding. (With discussion
- Wallace, Freeman
- 1987
(Show Context)
Citation Context ..., despite the fact that the notions of Kolmogorov complexity and randomness are extremely closely connected 1 : despite being non-computable, Kolmogorov complexity inspired the MDL and MML principles =-=[16, 26]-=- (and their generalization, Complexity Approximation Principle [24]), which have many practical applications. (It should be noted, however, that algorithmic randomness has been used in the discussions... |

14 |
Exact expressions for some randomness tests
- Gács
- 1980
(Show Context)
Citation Context ... randomness; this is its crucial feature which makes practical applications possible. Later Kolmogorov's definition was developed by, among others, Martin-Lof [15] (1966), Levin [11] (1973) and G'acs =-=[2]-=- (1980). The main goal of this paper is to study computable approximations to algorithmic randomness and to apply those approximations to some benchmark datasets. The main technical tool will be Vapni... |

12 | Statistical tolerance regions: Classical and Bayesian - Guttman - 1970 |

9 |
Algorithmic entropy (complexity) of finite objects and its application to defining randomness and amount of information, Semiotika i
- V’yugin
- 1981
(Show Context)
Citation Context ...uence minus its Kolmogorov complexity. a unified view of the results in Gammerman et al [4] (1998) and Saunders et al [17] (1999). For excellent reviews of algorithmic information theory, see V'yugin =-=[25]-=- (1994) and Li and Vitanyi [13] (1997). 2 ALGORITHMIC THEORY OF RANDOMNESS Typically we will be interested in randomness of a sequencesz = (z 1 ; : : : ; z n ) of elements z i 2 Z of some sample space... |

6 |
Bayesian diagnostic probabilities without assuming independence of symptoms
- Gammerman, Thatcher
- 1991
(Show Context)
Citation Context ...ty estimation becomes possible when additional assumptions are made. In low-dimensional situations, informative confidence intervals for density estimation are obtained in, eg, Gammerman and Thatcher =-=[3] (1992). 6.3 REGRESS-=-ION There are several possible understanding of the term "regression". One understanding is "regression estimation ": we assume that the examples (x i ; y i ) are generated by some... |

5 | On the concept of the Bernoulli property - Vovk - 1986 |

4 | Computational Machine Learning in Theory and Praxis
- Li, Vit'anyi
- 1995
(Show Context)
Citation Context ...iple [24]), which have many practical applications. (It should be noted, however, that algorithmic randomness has been used in the discussions of the MDL principle; see Li and Vitanyi [13], 1997, and =-=[12]-=-, 1995.) The main advantage of Kolmogorov's notion of randomness in comparison with the earlier definitions (eg, von Mises's) is that it is applicable to finite sequences and that it provides degrees ... |

4 |
Vovk V., Transductive with confidence and credibility
- Saunders, Gammerman
- 1999
(Show Context)
Citation Context ...and they are able to deal with extremely high-dimensional hypothesis spaces; cf Vapnik [21] (1998). In this paper we will further develop the approach of Gammerman et al [4] (1998) and Saunders et al =-=[17]-=- Figure 1: If the training set only contains clear 2s and 7s, we would like to attach much lower confidence to the middle image than to the right and left ones (1999), where the goal is to obtain conf... |

3 | Complexity Approximation Principle
- Vovk, Gammerman
- 1999
(Show Context)
Citation Context ...ness are extremely closely connected 1 : despite being non-computable, Kolmogorov complexity inspired the MDL and MML principles [16, 26] (and their generalization, Complexity Approximation Principle =-=[24]-=-), which have many practical applications. (It should be noted, however, that algorithmic randomness has been used in the discussions of the MDL principle; see Li and Vitanyi [13], 1997, and [12], 199... |

1 |
Volodya Vovk. Learning by transduction
- Gammerman, Vapnik
- 1998
(Show Context)
Citation Context ...ssical parametric statistics) and they are able to deal with extremely high-dimensional hypothesis spaces; cf Vapnik [21] (1998). In this paper we will further develop the approach of Gammerman et al =-=[4]-=- (1998) and Saunders et al [17] Figure 1: If the training set only contains clear 2s and 7s, we would like to attach much lower confidence to the middle image than to the right and left ones (1999), w... |

1 |
Resource-bounded Kolmogorov complexity and statistical tests
- Longpr'e
- 1992
(Show Context)
Citation Context ...y this happened are: ffl algorithmic measures of randomness are noncomputable; ffl little work has been done on computable approximations to Kolmogorov's randomness (one of the exceptions is Longpr'e =-=[14]-=-, 1992); ffl the algorithmic theory of randomness has been mainly concerned with the case of binary sequences, which is far too restrictive for any practical applications. Remark 1 It is interesting t... |