Results 1 -
4 of
4
Learning by Transduction
- In Uncertainty in Artificial Intelligence
, 1998
"... We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object /classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-v ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
We describe a method for predicting a classification of an object given classifications of the objects in the training set, assuming that the pairs object /classification are generated by an i.i.d. process from a continuous probability distribution. Our method is a modification of Vapnik's support-vector machine; its main novelty is that it gives not only the prediction itself but also a practicable measure of the evidence found in support of that prediction. We also describe a procedure for assigning degrees of confidence to predictions made by the support vector machine. Some experimental results are presented, and possible extensions of the algorithms are discussed. 1 THE PROBLEM Suppose labeled points (x i ; y i ) (i = 1; 2; : : :), where x i 2 IR n (our objects are specified by n real-valued attributes) and y i 2 f\Gamma1; 1g, are generated independently from an unknown (but the same for all points) probability distribution. We are given l points x i , i = 1; : : : ; l, toge...
Machine-Learning Applications of Algorithmic Randomness
- In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we com ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...
Hedging predictions in machine learning
- Comput. J
, 2007
"... Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for ‘hedging ’ the predictions output by many such algorithms, including support vector machines, kernel ridge regressi ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for ‘hedging ’ the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects ’ features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning. 1
Evolutionary computations based on bayesian classifiers
- INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE
, 2004
"... Evolutionary computation is a discipline that has been emerging for at least 40 or 50 years. All methods within this discipline are characterized by maintaining a set of possible solutions (individuals) to make them successively evolve to fitter solutions generation after generation. Examples of evo ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Evolutionary computation is a discipline that has been emerging for at least 40 or 50 years. All methods within this discipline are characterized by maintaining a set of possible solutions (individuals) to make them successively evolve to fitter solutions generation after generation. Examples of evolutionary computation paradigms are the broadly known Genetic Algorithms (GAs) and Estimation of Distribution Algorithms (EDAs). This paper contributes to the further development of this discipline by introducing a new evolutionary computation method based on the learning and later simulation of a Bayesian classifier in every generation. In the method we propose, at each iteration the selected group of individuals of the population is divided into different classes depending on their respective fitness value. Afterwards, a Bayesian classifier—either naive Bayes, seminaive Bayes, tree augmented naive Bayes or a similar one—is learned to model the corresponding supervised classification problem. The simulation of the latter Bayesian classifier provides individuals that form the next generation. Experimental results are presented to compare the performance of this new method with different types of EDAs and GAs. The problems

