Results 1  10
of
12
Model Selection and the Principle of Minimum Description Length
 Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract

Cited by 145 (5 self)
 Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate th...
A Game of Prediction with Expert Advice
 Journal of Computer and System Sciences
, 1997
"... We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, u ..."
Abstract

Cited by 103 (7 self)
 Add to MetaCart
We consider the following problem. At each point of discrete time the learner must make a prediction; he is given the predictions made by a pool of experts. Each prediction and the outcome, which is disclosed after the learner has made his prediction, determine the incurred loss. It is known that, under weak regularity, the learner can ensure that his cumulative loss never exceeds cL+ a ln n, where c and a are some constants, n is the size of the pool, and L is the cumulative loss incurred by the best expert in the pool. We find the set of those pairs (c; a) for which this is true.
Data Clustering: 50 Years Beyond KMeans
, 2008
"... Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and m ..."
Abstract

Cited by 75 (3 self)
 Add to MetaCart
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, Kmeans, was first published in 1955. In spite of the fact that Kmeans was proposed over 50 years ago and thousands of clustering algorithms have been published since then, Kmeans is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semisupervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 63 (10 self)
 Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
MachineLearning Applications of Algorithmic Randomness
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, noncomputable. In this paper we com ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, noncomputable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...
Kolmogorov Complexity: Sources, Theory and Applications
 The Computer Journal
, 1999
"... ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903 1987). The motivations behind their ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903 1987). The motivations behind their work were completely different; Solomonoff was interested in inductive inference and artificial intelligence and Kolmogorov was interested in the foundations of probability theory and, also, of information theory. They arrived, nevertheless, at the same mathematical notion, which is now known as Kolmogorov complexity. In 1964 Solomonoff published his model of inductive inference. He argued that any inference problem can be presented as a problem of extrapolating a very long sequence of binary symbols; `given a very long sequence, represented by T , what is the probability that it will be followed by a ... sequence A?'. Solomonoff assumed
INFORMATION MEASURE FOR MODULARITY IN ENGINEERING DESIGN
, 2004
"... Modular structures are common in complex natural and artificial systems, and the terms “modular” or “modularity” are used throughout the engineering design literature. However, formal ways to measure or quantify modularity are still needed. This paper introduces an informationbased approach to meas ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Modular structures are common in complex natural and artificial systems, and the terms “modular” or “modularity” are used throughout the engineering design literature. However, formal ways to measure or quantify modularity are still needed. This paper introduces an informationbased approach to measure modularity, built on the relationship between complexity and modularity. In this informationbased measure, a modular structure is encoded as a message describing information contained in the modular structure; the shorter the message, the higher the modularity of the structure. The information measure is dependent on the modeling and representation of the system. Following this basic idea, an approximate expression for the information measure of abstract graph structures is introduced. Since function structures in engineering design are typically represented as abstract graphs, this approach can be used to synthesize favorable modularity in parallel with the design of new systems. Using a genetic algorithm approach, with the reciprocal of the approximate measure as the fitness function, modular configurations are found in abstract graphs.
Complexity Approximation Principle
 Computer Journal
, 1999
"... INTRODUCTION The subject of this note is another inductive principle, which can be regarded as a direct generalization of the minimum description length (MDL) and minimum message length (MML) principles. We will describe the work started at the Computer Learning Research Centre (Royal Holloway, Uni ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
INTRODUCTION The subject of this note is another inductive principle, which can be regarded as a direct generalization of the minimum description length (MDL) and minimum message length (MML) principles. We will describe the work started at the Computer Learning Research Centre (Royal Holloway, University of London) related to this new principle, which we call the complexity approximation principle (CAP). Both MDL and MML principles can be interpreted as Kolmogorov complexity approximation principles (as explained in Rissanen [1, 2] and Wallace and Freeman [3]; see also [4]). It is shown in [5] and [6] that it is possible to generalize Kolmogorov complexity to describe the optimal performance in different `games of prediction'. Using this general notion, called predictive complexity,itis straightforward to extend the MDL and MML principles to our more general CAP. In Section 2 we define predictive complexity, in Section 3 several examples are given and in Section 4
Combining model selection procedures for online prediction
 Sankhya A
, 2001
"... SUMMARY. Here we give a technique for online prediction that uses different model selection principles (MSP’s) at different times. The central idea is that each MSP is associated with a collection of models for which it is best suited. This means one can use the data to choose an MSP. Then, the MSP ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
SUMMARY. Here we give a technique for online prediction that uses different model selection principles (MSP’s) at different times. The central idea is that each MSP is associated with a collection of models for which it is best suited. This means one can use the data to choose an MSP. Then, the MSP chosen is used with the data to choose a model, and the parameters of the model are estimated so that predictions can be made. Depending on the degree of discrepancy between the predicted values and the actual outcomes one may update the parameters within a model, reuse the MSP to rechoose the model and estimate its parameters, or start all over again rechoosing the MSP. Our main formal result is a theorem which gives conditions under which our technique performs better than always using the same MSP. We also discuss circumstances under which dropping data points may lead to better predictions. 1.
Hierarchical modularity: Decomposition of function structures with minimal description length principle
 In 17th International Conference on Design Theory and Methodology (DTM). ASME
, 2005
"... In engineering design and analysis, complex systems often need to be decomposed into a hierarchical combination of different simple subsystems. It’s necessary to provide formal, computable methods to hierarchically decompose complex structures. Since graph structures are commonly used as modeling me ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In engineering design and analysis, complex systems often need to be decomposed into a hierarchical combination of different simple subsystems. It’s necessary to provide formal, computable methods to hierarchically decompose complex structures. Since graph structures are commonly used as modeling methods in engineering practice, this paper presents a method to hierarchically decompose graph structures. The Minimal Description Length (MDL) principle is introduced as a measure to compare different decompositions. The best hierarchical decomposition is searched by evolutionary computation methods with newly defined crossover and mutation operators of tree structures. The results on abstract graph without attributes and a real function structure show that the technique is promising. 1