Results 1 - 10
of
203
Bayesian Interpolation
- Neural Computation
, 1991
"... Although Bayesian analysis has been in use since Laplace, the Bayesian method of model--comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and model--comparison is demonstrated by studying the inference problem of interpolating noisy data. T ..."
Abstract
-
Cited by 417 (17 self)
- Add to MetaCart
Although Bayesian analysis has been in use since Laplace, the Bayesian method of model--comparison has only recently been developed in depth. In this paper, the Bayesian approach to regularisation and model--comparison is demonstrated by studying the inference problem of interpolating noisy data. The concepts and methods described are quite general and can be applied to many other problems. Regularising constants are set by examining their posterior probability distribution. Alternative regularisers (priors) and alternative basis sets are objectively compared by evaluating the evidence for them. `Occam's razor' is automatically embodied by this framework. The way in which Bayes infers the values of regularising constants and noise levels has an elegant interpretation in terms of the effective number of parameters determined by the data set. This framework is due to Gull and Skilling. 1 Data modelling and Occam's razor In science, a central task is to develop and compare models to a...
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Separate-and-conquer rule learning
- Artificial Intelligence Review
, 1999
"... This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of ..."
Abstract
-
Cited by 118 (29 self)
- Add to MetaCart
This paper is a survey of inductive rule learning algorithms that use a separate-and-conquer strategy. This strategy can be traced back to the AQ learning system and still enjoys popularity as can be seen from its frequent use in inductive logic programming systems. We will put this wide variety of algorithms into a single framework and analyze them along three different dimensions, namely their search, language and overfitting avoidance biases.
Model Selection and the Principle of Minimum Description Length
- Journal of the American Statistical Association
, 1998
"... This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This ..."
Abstract
-
Cited by 114 (4 self)
- Add to MetaCart
This paper reviews the principle of Minimum Description Length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed interest within the statistics community. In the pages that follow, we review both the practical as well as the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines, we find many interesting interpretations of popular frequentist and Bayesian procedures. As we will see, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can co-exist and be compared. We illustrate th...
An efficient, probabilistically sound algorithm for segmentation and word discovery
- MACHINE LEARNING
, 1999
"... This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract
-
Cited by 103 (2 self)
- Add to MetaCart
This paper presents a model-based, unsupervised algorithm for recovering word boundaries in a natural-language text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, word-order, and word frequency can be replaced in a modular fashion. The model yields a language-independent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
Bayesian Models for Keyhole Plan Recognition in an Adventure Game
, 1998
"... We present an approach to keyhole plan recognition which uses a dynamic belief (Bayesian) network to represent features of the domain that are needed to identify users' plans and goals. The application domain is a Multi-User Dungeon adventure game with thousands of possible actions and locations. W ..."
Abstract
-
Cited by 99 (10 self)
- Add to MetaCart
We present an approach to keyhole plan recognition which uses a dynamic belief (Bayesian) network to represent features of the domain that are needed to identify users' plans and goals. The application domain is a Multi-User Dungeon adventure game with thousands of possible actions and locations. We propose several network structures which represent the relations in the domain to varying extents, and compare their predictive power for predicting a user's current goal, next action and next location. The conditional probability distributions for each network are learned during a training phase, which dynamically builds these probabilities from observations of user behaviour. This approach allows the use of incomplete, sparse and noisy data during both training and testing. We then apply simple abstraction and learning techniques in order to speed up the performance of the most promising dynamic belief networks without a significant change in the accuracy of goal predictions. Our experi...
Generalizing Case Frames Using a Thesaurus and the MDL Principle
- Computational Linguistics
, 1998
"... this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue ..."
Abstract
-
Cited by 95 (4 self)
- Add to MetaCart
this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue
Minimum Message Length and Kolmogorov Complexity
- Computer Journal
, 1999
"... this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465] ..."
Abstract
-
Cited by 86 (20 self)
- Add to MetaCart
this paper is to describe some of the relationships among the different streams and to try to clarify some of the important differences in their assumptions and development. Other studies mentioning the relationships appear in [1, Section IV, pp. 1038--1039], [2, sections 5.2, 5.5] and [3, p. 465]
Cooperative Robust Estimation Using Layers of Support
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1991
"... We present an approach to the problem of representing images that contain multiple objects or surfaces. Rather than use an edge-based approach to represent the segmentation of a scene, we propose a multi-layer estimation framework which uses support maps to represent the segmentation of the image in ..."
Abstract
-
Cited by 76 (5 self)
- Add to MetaCart
We present an approach to the problem of representing images that contain multiple objects or surfaces. Rather than use an edge-based approach to represent the segmentation of a scene, we propose a multi-layer estimation framework which uses support maps to represent the segmentation of the image into homogeneous chunks. This support-based approach can represent objects that are split into disjoint regions, or have surfaces that are transparently interleaved. Our framework is based on an extension of robust estimation methods which provide a theoretical basis for supportbased estimation. The Minimum Description Length principle is used to decide how many support maps to use in describing a particular image. We show results applying this framework to heterogeneous interpolation and segmentation tasks on range and motion imagery. 1 Introduction Real-world perceptual systems must deal with complicated and cluttered environments. To succeed in such environments, a system must be able to r...
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.

