Results 11  20
of
27
Junctions: Detection, Classification and Reconstruction
"... Junctions are important features for image analysis and form a critical aspect of image understanding tasks such as object recognition. We present a unified approach to detecting (location of the center of the junction), classifying (by the number of wedges  lines, corners, 3junctions such as T o ..."
Abstract

Cited by 35 (1 self)
 Add to MetaCart
Junctions are important features for image analysis and form a critical aspect of image understanding tasks such as object recognition. We present a unified approach to detecting (location of the center of the junction), classifying (by the number of wedges  lines, corners, 3junctions such as T or Y junctions, or 4junctions such as Xjunctions) and reconstructing junctions (in terms of radius size, the angles of each wedge and the intensity in each of the wedges) in images. Our main contribution is a modeling of the junction which is complex enough to handle all these issues and yet simple enough to admit an effective dynamic programming solution. Broadly, we use a template deformation framework along with a gradient criterium to detect radial partitions of the template. We use the minimum description length principle to obtain the optimal number of partitions that best describes the junction. Kona [27] is an implementation of this model. We (quantitatively) demonstrate the stabili...
MachineLearning Applications of Algorithmic Randomness
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, noncomputable. In this paper we com ..."
Abstract

Cited by 23 (13 self)
 Add to MetaCart
Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, noncomputable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...
Empirical Limits for Time Series Econometrics Models,” unpublished
, 1998
"... This paper characterizes empirically achievable limits for time series econometric modeling and forecasting. The approach involves the concept of minimal information loss in time series regression and the paper shows how to derive bounds that delimit the proximity of empirical measures to the true p ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
This paper characterizes empirically achievable limits for time series econometric modeling and forecasting. The approach involves the concept of minimal information loss in time series regression and the paper shows how to derive bounds that delimit the proximity of empirical measures to the true probability measure (the DGP) in models that are of econometric interest. The approach utilizes joint probability measures over the combined space of parameters and observables and the results apply for models with stationary, integrated, and cointegrated data. A theorem due to Rissanen is extended so that it applies directly to probabilities about the relative likelihood (rather than averages), a new way of proving results of the Rissanen type is demonstrated, and the Rissanen theory is extended to nonstationary time series with unit roots, near unit roots, and cointegration of unknown order. The corresponding bound for the minimal information loss in empirical work is shown not to be a constant, in general, but to be proportional to the logarithm of the determinant of the (possibility stochastic) Fisherinformation matrix. In fact, the bound that determines proximity to the DGP is generally path dependent, and it depends specifically on the type as well as the number of regressors. For practical purposes, the
Kolmogorov Complexity: Sources, Theory and Applications
 The Computer Journal
, 1999
"... ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903 1987). The motivations behind their ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903 1987). The motivations behind their work were completely different; Solomonoff was interested in inductive inference and artificial intelligence and Kolmogorov was interested in the foundations of probability theory and, also, of information theory. They arrived, nevertheless, at the same mathematical notion, which is now known as Kolmogorov complexity. In 1964 Solomonoff published his model of inductive inference. He argued that any inference problem can be presented as a problem of extrapolating a very long sequence of binary symbols; `given a very long sequence, represented by T , what is the probability that it will be followed by a ... sequence A?'. Solomonoff assumed
Model Selection Criteria for Learning Belief Nets: An Empirical Comparison
 In ICML’00
, 2000
"... We are interested in the problem of learning the dependency structure of a belief net, which involves a tradeo between simplicity and goodness of t to the training data. We describe the results of an empirical comparison of three standard model selection criteria  viz., a Minimum Description ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We are interested in the problem of learning the dependency structure of a belief net, which involves a tradeo between simplicity and goodness of t to the training data. We describe the results of an empirical comparison of three standard model selection criteria  viz., a Minimum Description Length criterion (MDL), Akaike's Information Criterion (AIC) and a CrossValidation criterion  applied to this problem. Our results suggest that AIC and CrossValidation are both good criteria for avoiding overtting, but MDL does not work well in this context. 1. Introduction In learning a model of a datagenerating process from a random sample, a fundamental problem is nding the right balance between the complexity of the model and its goodness of t to the training data. A more complex model can usually achieve a closer t to the training data, but this may be because the model re ects not just signicant regularities in the data but also minor variations due to random samp...
An Introduction to Bayesian Network Theory and Usage
, 2000
"... . I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this eld's theory. Bayesian networks are a combination of two dierent mathematical areas: graph theory and probability theory. So, I rst give the basic denition of Bayesian networks. ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
. I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this eld's theory. Bayesian networks are a combination of two dierent mathematical areas: graph theory and probability theory. So, I rst give the basic denition of Bayesian networks. This is followed by an elaboration of the underlying graph theory that involves the arrangements of nodes and edges in a graph. Since Bayesian networks encode one's beliefs for a system of variables, I then proceed to discuss, in general, how to update these beliefs when one or more of the variables' values are no longer unknown (i.e., you have observed their values). Learning algorithms involve a combination of learning the probability distributions along with learning the network topology. I then conclude Part I by showing how Bayesian networks can be used in various domains, such as in the timeseries problem of automatic speech recognition. In Part II I then give in more detail some ...
Fusion of Domain Knowledge with Data for Structural Learning in Object Oriented Domains
, 2003
"... When constructing a Bayesian network, it can be advantageous to employ structural learning algorithms to combine knowledge captured in databases with prior information provided by domain experts. Unfortunately, conventional learning algorithms do not easily incorporate prior information, if this inf ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
When constructing a Bayesian network, it can be advantageous to employ structural learning algorithms to combine knowledge captured in databases with prior information provided by domain experts. Unfortunately, conventional learning algorithms do not easily incorporate prior information, if this information is too vague to be encoded as properties that are local to families of variables. For instance, conventional algorithms do not exploit prior information about repetitive structures, which are often found in object oriented domains such as computer networks, large pedigrees and genetic analysis.
Embedded Bayesian Network Classifiers
, 1997
"... Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The mo ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The model for a node Y given parents X is obtained from a (usually different) Bayesian network for Y and X in which X need not be the parents of Y . We show that an EBNC is a special case of a softmax polynomial regression model. Also, we show how to identify a nonredundant set of parameters for an EBNC, and describe an asymptotic approximation for learning the structure of Bayesian networks that contain EBNCs. Unlike the decision tree, decision graph, and causal independence models, we are unaware of a semantic justification for the use of these models. Experiments are needed to determine whether the models presented in this paper are useful in practice. Keywords: Bayesian networks, model dimen...
Complexity Approximation Principle
 Computer Journal
, 1999
"... INTRODUCTION The subject of this note is another inductive principle, which can be regarded as a direct generalization of the minimum description length (MDL) and minimum message length (MML) principles. We will describe the work started at the Computer Learning Research Centre (Royal Holloway, Uni ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
INTRODUCTION The subject of this note is another inductive principle, which can be regarded as a direct generalization of the minimum description length (MDL) and minimum message length (MML) principles. We will describe the work started at the Computer Learning Research Centre (Royal Holloway, University of London) related to this new principle, which we call the complexity approximation principle (CAP). Both MDL and MML principles can be interpreted as Kolmogorov complexity approximation principles (as explained in Rissanen [1, 2] and Wallace and Freeman [3]; see also [4]). It is shown in [5] and [6] that it is possible to generalize Kolmogorov complexity to describe the optimal performance in different `games of prediction'. Using this general notion, called predictive complexity,itis straightforward to extend the MDL and MML principles to our more general CAP. In Section 2 we define predictive complexity, in Section 3 several examples are given and in Section 4