Results 11  20
of
42
Junctions: Detection, Classification and Reconstruction
"... Junctions are important features for image analysis and form a critical aspect of image understanding tasks such as object recognition. We present a unified approach to detecting (location of the center of the junction), classifying (by the number of wedges  lines, corners, 3junctions such as T o ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
Junctions are important features for image analysis and form a critical aspect of image understanding tasks such as object recognition. We present a unified approach to detecting (location of the center of the junction), classifying (by the number of wedges  lines, corners, 3junctions such as T or Y junctions, or 4junctions such as Xjunctions) and reconstructing junctions (in terms of radius size, the angles of each wedge and the intensity in each of the wedges) in images. Our main contribution is a modeling of the junction which is complex enough to handle all these issues and yet simple enough to admit an effective dynamic programming solution. Broadly, we use a template deformation framework along with a gradient criterium to detect radial partitions of the template. We use the minimum description length principle to obtain the optimal number of partitions that best describes the junction. Kona [27] is an implementation of this model. We (quantitatively) demonstrate the stabili...
Learning Probabilistic Networks
 THE KNOWLEDGE ENGINEERING REVIEW
, 1998
"... A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combini ..."
Abstract

Cited by 43 (2 self)
 Add to MetaCart
A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combining prior knowledge, which might be limited solely to experience of the influences between some of the variables of interest, and data. In this paper, we first show how data can be used to revise initial estimates of the parameters of a model. We then progress to showing how the structure of the model can be revised as data is obtained. Techniques for learning with incomplete data are also covered.
MachineLearning Applications of Algorithmic Randomness
 In Proceedings of the Sixteenth International Conference on Machine Learning
, 1999
"... Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, noncomputable. In this paper we com ..."
Abstract

Cited by 27 (14 self)
 Add to MetaCart
(Show Context)
Most machine learning algorithms share the following drawback: they only output bare predictions but not the confidence in those predictions. In the 1960s algorithmic information theory supplied universal measures of confidence but these are, unfortunately, noncomputable. In this paper we combine the ideas of algorithmic information theory with the theory of Support Vector machines to obtain practicable approximations to universal measures of confidence. We show that in some standard problems of pattern recognition our approximations work well. 1 INTRODUCTION Two important differences of most modern methods of machine learning (such as statistical learning theory, see Vapnik [21], 1998, or PAC theory) from classical statistical methods are that: ffl machine learning methods produce bare predictions, without estimating confidence in those predictions (unlike, eg, prediction of future observations in traditional statistics (Guttman [5], 1970)); ffl many machine learning ...
Empirical Limits for Time Series Econometrics Models,” unpublished
, 1998
"... This paper characterizes empirically achievable limits for time series econometric modeling and forecasting. The approach involves the concept of minimal information loss in time series regression and the paper shows how to derive bounds that delimit the proximity of empirical measures to the true p ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
This paper characterizes empirically achievable limits for time series econometric modeling and forecasting. The approach involves the concept of minimal information loss in time series regression and the paper shows how to derive bounds that delimit the proximity of empirical measures to the true probability measure (the DGP) in models that are of econometric interest. The approach utilizes joint probability measures over the combined space of parameters and observables and the results apply for models with stationary, integrated, and cointegrated data. A theorem due to Rissanen is extended so that it applies directly to probabilities about the relative likelihood (rather than averages), a new way of proving results of the Rissanen type is demonstrated, and the Rissanen theory is extended to nonstationary time series with unit roots, near unit roots, and cointegration of unknown order. The corresponding bound for the minimal information loss in empirical work is shown not to be a constant, in general, but to be proportional to the logarithm of the determinant of the (possibility stochastic) Fisherinformation matrix. In fact, the bound that determines proximity to the DGP is generally path dependent, and it depends specifically on the type as well as the number of regressors. For practical purposes, the
Model Selection Criteria for Learning Belief Nets: An Empirical Comparison
 In ICML’00
, 2000
"... We are interested in the problem of learning the dependency structure of a belief net, which involves a tradeo between simplicity and goodness of t to the training data. We describe the results of an empirical comparison of three standard model selection criteria  viz., a Minimum Description ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
We are interested in the problem of learning the dependency structure of a belief net, which involves a tradeo between simplicity and goodness of t to the training data. We describe the results of an empirical comparison of three standard model selection criteria  viz., a Minimum Description Length criterion (MDL), Akaike's Information Criterion (AIC) and a CrossValidation criterion  applied to this problem. Our results suggest that AIC and CrossValidation are both good criteria for avoiding overtting, but MDL does not work well in this context. 1. Introduction In learning a model of a datagenerating process from a random sample, a fundamental problem is nding the right balance between the complexity of the model and its goodness of t to the training data. A more complex model can usually achieve a closer t to the training data, but this may be because the model re ects not just signicant regularities in the data but also minor variations due to random samp...
Model Selection
 In The Handbook Of Financial Time Series
, 2008
"... Model selection has become an ubiquitous statistical activity in the last decades, none the least due to the computational ease with which many statistical models can be fitted to data with the help of modern computing equipment. In this article we provide an introduction to the statistical aspect ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Model selection has become an ubiquitous statistical activity in the last decades, none the least due to the computational ease with which many statistical models can be fitted to data with the help of modern computing equipment. In this article we provide an introduction to the statistical aspects and implications of model selection and we review the relevant literature. 1.1 A General Formulation When modeling data Y, a researcher often has available a menu of competing candidate models which could be used to describe the data. Let M denote the collection of these candidate models. Each model M, i.e., each element of M, can – from a mathematical point of view – be viewed as a collection of probability distributions for Y implied by the model. That is, M is given by M = {Pη: η ∈ H}, where Pη denotes a probability distribution for Y and H represents the ‘parameter ’ space (which can be different across different models M). The ‘parameter ’ space H need not be finitedimensional. Often, the ‘parameter ’ η will be partitioned into (η1, η2) where η1 is a finitedimensional parameter whereas η2 is infinitedimensional. In case the parameterization is identified, i.e., the map η → Pη is injective on H, we will often not distinguish between M and H and will use them synonymously. The model selection problem is now to select – based on the data Y – a model M ̂ = M̂(Y) in M such that M ̂ is a ‘good ’ model for the data Y. Of course, the sense, in which the selected model should be a ‘good ’ model, needs to be made precise and is a crucial point in the analysis. This is particularly important if – as is usually the case – selecting the model M ̂ is not the final
Kolmogorov Complexity: Sources, Theory and Applications
 The Computer Journal
, 1999
"... ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903 1987). The motivations behind their ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
ing applications based on different ways of approximating Kolmogorov complexity. 2. BEGINNINGS As we have already mentioned, the two main originators of the theory of Kolmogorov complexity were Ray Solomonoff (born 1926) and Andrei Nikolaevich Kolmogorov (1903 1987). The motivations behind their work were completely different; Solomonoff was interested in inductive inference and artificial intelligence and Kolmogorov was interested in the foundations of probability theory and, also, of information theory. They arrived, nevertheless, at the same mathematical notion, which is now known as Kolmogorov complexity. In 1964 Solomonoff published his model of inductive inference. He argued that any inference problem can be presented as a problem of extrapolating a very long sequence of binary symbols; `given a very long sequence, represented by T , what is the probability that it will be followed by a ... sequence A?'. Solomonoff assumed
An Introduction to Bayesian Network Theory and Usage
, 2000
"... . I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this eld's theory. Bayesian networks are a combination of two dierent mathematical areas: graph theory and probability theory. So, I rst give the basic denition of Bayesian netwo ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
. I present an introduction to some of the concepts within Bayesian networks to help a beginner become familiar with this eld's theory. Bayesian networks are a combination of two dierent mathematical areas: graph theory and probability theory. So, I rst give the basic denition of Bayesian networks. This is followed by an elaboration of the underlying graph theory that involves the arrangements of nodes and edges in a graph. Since Bayesian networks encode one's beliefs for a system of variables, I then proceed to discuss, in general, how to update these beliefs when one or more of the variables' values are no longer unknown (i.e., you have observed their values). Learning algorithms involve a combination of learning the probability distributions along with learning the network topology. I then conclude Part I by showing how Bayesian networks can be used in various domains, such as in the timeseries problem of automatic speech recognition. In Part II I then give in more detail some ...
Fusion of Domain Knowledge with Data for Structural Learning in Object Oriented Domains
, 2003
"... When constructing a Bayesian network, it can be advantageous to employ structural learning algorithms to combine knowledge captured in databases with prior information provided by domain experts. Unfortunately, conventional learning algorithms do not easily incorporate prior information, if this inf ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
When constructing a Bayesian network, it can be advantageous to employ structural learning algorithms to combine knowledge captured in databases with prior information provided by domain experts. Unfortunately, conventional learning algorithms do not easily incorporate prior information, if this information is too vague to be encoded as properties that are local to families of variables. For instance, conventional algorithms do not exploit prior information about repetitive structures, which are often found in object oriented domains such as computer networks, large pedigrees and genetic analysis.
Embedded Bayesian Network Classifiers
, 1997
"... Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The mo ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The model for a node Y given parents X is obtained from a (usually different) Bayesian network for Y and X in which X need not be the parents of Y . We show that an EBNC is a special case of a softmax polynomial regression model. Also, we show how to identify a nonredundant set of parameters for an EBNC, and describe an asymptotic approximation for learning the structure of Bayesian networks that contain EBNCs. Unlike the decision tree, decision graph, and causal independence models, we are unaware of a semantic justification for the use of these models. Experiments are needed to determine whether the models presented in this paper are useful in practice. Keywords: Bayesian networks, model dimen...