Results 1 - 10
of
10
Model Selection and the Principle of Minimum Description Length
- Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate th...
A Game of Prediction with Expert Advice
- Journal of Computer and System Sciences
, 1997
"... We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, u ..."
Abstract
-
Cited by 86 (6 self)
- Add to MetaCart
We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, under weak regularity, the learner can ensure that his cumulative loss never exceeds cL+ a ln n, where c and a are some constants, n is the size of the pool, and L is the cumulative loss incurred by the best expert in the pool. We find the set of those pairs (c; a) for which this is true.
Competitive on-line statistics
- International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on-line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive on-line algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive on-line statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive on-line statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive on-line algorithms, linear regression, prequential statistics, worst-case analysis.
Machine-Learning Applications of Algorithmic Randomness
- In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we com ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, non-computable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...
Kolmogorov Complexity: Sources, Theory and Applications
- The Computer Journal
, 1999
"... ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903-- 1987). The motivations behind their ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903-- 1987). The motivations behind their work were completely different; Solomonoff was interested in inductive inference and artificial intelligence and Kolmogorov was interested in the foundations of probability theory and, also, of information theory. They arrived, nevertheless, at the same mathematical notion, which is now known as Kolmogorov complexity. In 1964 Solomonoff published his model of inductive inference. He argued that any inference problem can be presented as a problem of extrapolating a very long sequence of binary symbols; `given a very long sequence, represented by T , what is the probability that it will be followed by a ... sequence A?'. Solomonoff assumed
Complexity Approximation Principle
- Computer Journal
, 1999
"... INTRODUCTION The subject of this note is another inductive principle, which can be regarded as a direct generalization of the minimum description length (MDL) and minimum message length (MML) principles. We will describe the work started at the Computer Learning Research Centre (Royal Holloway, Uni ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
INTRODUCTION The subject of this note is another inductive principle, which can be regarded as a direct generalization of the minimum description length (MDL) and minimum message length (MML) principles. We will describe the work started at the Computer Learning Research Centre (Royal Holloway, University of London) related to this new principle, which we call the complexity approximation principle (CAP). Both MDL and MML principles can be interpreted as Kolmogorov complexity approximation principles (as explained in Rissanen [1, 2] and Wallace and Freeman [3]; see also [4]). It is shown in [5] and [6] that it is possible to generalize Kolmogorov complexity to describe the optimal performance in different `games of prediction'. Using this general notion, called predictive complexity,itis straightforward to extend the MDL and MML principles to our more general CAP. In Section 2 we define predictive complexity, in Section 3 several examples are given and in Section 4
Combining model selection procedures for online prediction
- Sankhya A
, 2001
"... SUMMARY. Here we give a technique for online prediction that uses different model selection principles (MSP’s) at different times. The central idea is that each MSP is associated with a collection of models for which it is best suited. This means one can use the data to choose an MSP. Then, the MSP ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
SUMMARY. Here we give a technique for online prediction that uses different model selection principles (MSP’s) at different times. The central idea is that each MSP is associated with a collection of models for which it is best suited. This means one can use the data to choose an MSP. Then, the MSP chosen is used with the data to choose a model, and the parameters of the model are estimated so that predictions can be made. Depending on the degree of discrepancy between the predicted values and the actual outcomes one may update the parameters within a model, re-use the MSP to rechoose the model and estimate its parameters, or start all over again rechoosing the MSP. Our main formal result is a theorem which gives conditions under which our technique performs better than always using the same MSP. We also discuss circumstances under which dropping data points may lead to better predictions. 1.
Hierarchical modularity: Decomposition of function structures with minimal description length principle
- In 17th International Conference on Design Theory and Methodology (DTM). ASME
, 2005
"... In engineering design and analysis, complex systems often need to be decomposed into a hierarchical combination of different simple subsystems. It’s necessary to provide formal, computable methods to hierarchically decompose complex structures. Since graph structures are commonly used as modeling me ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In engineering design and analysis, complex systems often need to be decomposed into a hierarchical combination of different simple subsystems. It’s necessary to provide formal, computable methods to hierarchically decompose complex structures. Since graph structures are commonly used as modeling methods in engineering practice, this paper presents a method to hierarchically decompose graph structures. The Minimal Description Length (MDL) principle is introduced as a measure to compare different decompositions. The best hierarchical decomposition is searched by evolutionary computation methods with newly defined crossover and mutation operators of tree structures. The results on abstract graph without attributes and a real function structure show that the technique is promising. 1
Statistical Computing
- Proeeedings of the ACM Workshop on Interaetive 3D Graphies
, 1988
"... this article is as follows. In Section 2 we further motivate the importance of structural zeros. Section 3 introduces a heuristic test statistic based on this idea. Section 4 describes an experiment to test the statistic and gives results. Section 5 concludes with a discussion. To facilitate explana ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this article is as follows. In Section 2 we further motivate the importance of structural zeros. Section 3 introduces a heuristic test statistic based on this idea. Section 4 describes an experiment to test the statistic and gives results. Section 5 concludes with a discussion. To facilitate explanations in later sections we discuss at this point our data and make some definitions.
New Delhi
"... The vast amount of hidden data in huge databases has created tremendous interests in the field of data mining. This paper discusses the data analytical tools and data mining techniques to analyze the medical data as well as spatial data. Spatial data mining includes discovery of interesting and usef ..."
Abstract
- Add to MetaCart
The vast amount of hidden data in huge databases has created tremendous interests in the field of data mining. This paper discusses the data analytical tools and data mining techniques to analyze the medical data as well as spatial data. Spatial data mining includes discovery of interesting and useful patterns from spatial databases by grouping the objects into clusters. This study focuses on discrete and continuous spatial medical databases on which clustering techniques are applied and the efficient clusters were formed. The clusters of arbitrary shapes are formed if the data is continuous in nature. Furthermore, this application investigated data mining techniques such as classical clustering and hierarchical clustering on the spatial data set to generate the efficient clusters. The experimental results showed that there are certain facts that are evolved and can not be superficially retrieved from raw data.

