Results 1  10
of
137
Irrelevant Features and the Subset Selection Problem
 MACHINE LEARNING: PROCEEDINGS OF THE ELEVENTH INTERNATIONAL
, 1994
"... We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small highaccuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features ..."
Abstract

Cited by 594 (23 self)
 Add to MetaCart
We address the problem of finding a subset of features that allows a supervised induction algorithm to induce small highaccuracy concepts. We examine notions of relevance and irrelevance, and show that the definitions used in the machine learning literature do not adequately partition the features into useful categories of relevance. We present definitions for irrelevance and for two degrees of relevance. These definitions improve our understanding of the behavior of previous subset selection algorithms, and help define the subset of features that should be sought. The features selected should depend not only on the features and the target concept, but also on the induction algorithm. We describe a method for feature subset selection using crossvalidation that is applicable to any induction algorithm, and discuss experiments conducted with ID3 and C4.5 on artificial and real datasets.
Bayesian Interpolation
 Neural Computation
, 1991
"... Although Bayesian analysis has been in use since Laplace, the Bayesian method of modelcomparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and modelcomparison is demonstrated by studying the inference problem of interpolating noisy data. T ..."
Abstract

Cited by 520 (18 self)
 Add to MetaCart
Although Bayesian analysis has been in use since Laplace, the Bayesian method of modelcomparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and modelcomparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other problems. Regularising constants are set by examining their posterior probability distribution. Alternative regularisers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. `Occam's razor' is automatically embodied by this framework. The way in which Bayes infers the values of regularising constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling. 1 Data modelling and Occam's razor In science, a central task is to develop and compare models to a...
Survey of clustering data mining techniques
, 2002
"... Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in math ..."
Abstract

Cited by 247 (0 self)
 Add to MetaCart
Accrue Software, Inc. Clustering is a division of data into groups of similar objects. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Data modeling puts clustering in a historical perspective rooted in mathematics, statistics, and numerical analysis. From a machine learning perspective clusters correspond to hidden patterns, the search for clusters is unsupervised learning, and the resulting system represents a data concept. From a practical perspective clustering plays an outstanding role in data mining applications such as scientific data exploration, information retrieval and text mining, spatial database applications, Web analysis, CRM, marketing, medical diagnostics, computational biology, and many others. Clustering is the subject of active research in several fields such as statistics, pattern recognition, and machine learning. This survey focuses on clustering in data mining. Data mining adds to clustering the complications of very large datasets with very many attributes of different types. This imposes unique
Minimum Message Length and Kolmogorov Complexity
 Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract

Cited by 104 (25 self)
 Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 10381039], [2, sections 5.2, 5.5] and [3, p. 465]
Bestfirst Model Merging for Hidden Markov Model Induction
, 1994
"... This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are p ..."
Abstract

Cited by 93 (7 self)
 Add to MetaCart
This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are produced by merging HMM states. A Bayesian posterior probability criterion is used to determine which states to merge and when to stop generalizing. The procedure may be considered a heuristic search for the HMM structure with the highest posterior probability. We discuss a variety of possible priors for HMMs, as well as a number of approximations which improve the computational efficiency of the algorithm. We studied three applications to evaluate the procedure. The first compares the merging algorithm with the standard BaumWelch approach in inducing simple finitestate languages from small, positiveonly training samples. We found that the merging procedure is more robust and accurate, part...
Bayesian Approaches to Gaussian Mixture Modelling
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1998
"... A Bayesianbased methodology is presented which automatically penalises overcomplex models being fitted to unknown data. We show that, with a Gaussian mixture model, the approach is able to select an `optimal' number of components in the model and so partition data sets. The performance of the Baye ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
A Bayesianbased methodology is presented which automatically penalises overcomplex models being fitted to unknown data. We show that, with a Gaussian mixture model, the approach is able to select an `optimal' number of components in the model and so partition data sets. The performance of the Bayesian method is compared to other methods of optimal model selection and found to give good results. The methods are tested on synthetic and real data sets. Introduction Scientific disciplines generate data. In the attempt to understand the patterns present in such data sets methods which perform some form of unsupervised partitioning or modelling are particularly useful. Such an approach is only of use, however, if it offers a less complex representation of the data than the data set itself. This introduces an apparent conflict, however, as any model improves its fit to the data monotonically with increases in its complexity (the number of model parameters)  a model as complex as the data...
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
 IEEE Transactions on Information Theory
, 1998
"... The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition un ..."
Abstract

Cited by 67 (7 self)
 Add to MetaCart
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's mi...
A tutorial introduction to the minimum description length principle
 in Advances in Minimum Description Length: Theory and Applications. 2005
"... ..."